Keystroke duration identity verification method based on keyboard location function encoding and attention

What is AI technical title?
AI technical title is built by Patsnap AI team. It summarizes the technical point description of the patent document.
By using keyboard position function encoding and multi-slide sequence methods, combined with feature attention and rhythm attention neural network models, the problems of insufficient feature extraction and poor stability in dynamic environments in existing technologies are solved, and efficient continuous authentication is achieved.

CN121389087BActive Publication Date: 2026-06-16BEIJING UNIV OF POSTS & TELECOMM

View PDF 2 Cites 0 Cited by

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Patents(China)
Current Assignee / Owner: BEIJING UNIV OF POSTS & TELECOMM
Filing Date: 2025-04-10
Publication Date: 2026-06-16

AI Technical Summary

Technical Problem

Existing persistent authentication technologies based on keystroke behavior features exhibit performance degradation in free text, insufficient feature extraction leading to overfitting and poor generalization performance, inability to effectively distinguish user keystroke rhythm, and poor stability in dynamic environments.

Method used

Feature augmentation is achieved using keyboard position function encoding and multi-slide sequence methods. This is combined with feature attention and rhythm attention neural network models, trained through multi-layer fully connected layers and Dropout. Continuous validation algorithms are used for dynamic weighting and confidence evaluation to improve the model's generalization ability and robustness.

Benefits of technology

It improves the model's sensitivity and accuracy to user keystroke rhythm, reduces the risk of overfitting, enhances adaptability and accuracy in dynamic environments, reduces false rejection rate, and achieves efficient continuous authentication.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure CN121389087B_ABST

Patent Text Reader

Abstract

The application discloses a keystroke duration identity authentication method based on keyboard position function coding and attention, and belongs to the field of information technology; the specific steps are as follows: firstly, user keystroke data is collected, and the keystroke data is subjected to vectorization processing; then, the keystroke data is subjected to feature expansion through keyboard function position coding; and then, the keystroke data is divided into three sequence length input data through a multi-sliding sequence method, and the input data is respectively input into a neural network model; the neural network model is trained based on the input data, keystroke rhythm is extracted through a feature attention and rhythm attention module, and model output probability is obtained through an activation function; in the identity authentication stage, the keystroke data is collected in real time and subjected to vectorization processing, then the trained neural network model is input to obtain output probability, and a continuous authentication algorithm based on time weighting, dynamic fluctuation and confidence evaluation is used to adjust the output probability to obtain a final identity authentication probability.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention belongs to the field of information technology and relates to a method for persistent keystroke authentication based on keyboard position function encoding and attention. Background Technology

[0002] In today's era of deep integration of information technology and digitalization, technologies such as remote work, cloud computing, and the Internet of Things are becoming increasingly widespread. User device access scenarios are becoming more and more complex. As two pillars for building information security defenses, information encryption and identity verification technologies are becoming increasingly important, with identity verification playing a crucial role as the first line of defense.

[0003] Static passwords are the most widely used authentication method in daily life. Users often use simple combinations like "name + birthday" or "common words + numbers" for ease of memorization. However, these passwords can be cracked within hours through dictionary attacks, posing a high risk. Even with strong and complex passwords, they can still be stolen through phishing attacks. The second method provides protection through physical devices or communication links, but there's a trade-off between application costs and user experience: hardware tokens need to be carried separately and are easily lost, while SMS verification codes are limited by network latency and signal coverage, and are completely compromised if the phone is lost. Biometrics rely on the uniqueness of physiological characteristics to achieve high-precision authentication, but how to ensure accuracy while avoiding privacy violations is a problem that biometrics needs to solve. Furthermore, the deployment of biometrics on desktops faces some challenges: the hardware cost of dedicated sensors (such as fingerprint modules and infrared cameras) limits their widespread adoption by small and medium-sized enterprises and individual users, with many prerequisites and a lack of applicability and portability. These traditional solutions that rely on "single sign-once verification" are ill-equipped to deal with persistent security threats. They only confirm identity at the moment of login but cannot address the risk of identity theft in subsequent operations. Therefore, it is necessary to build a protection mechanism that is deeply integrated with user behavior.

[0004] When facing complex intrusion attacks, information security protection needs to shift from simple password defense to a dynamic security system covering the entire lifecycle. Keystroke dynamics verification demonstrates unique advantages in this area. This technology constructs unique behavioral characteristics for each individual by analyzing the micro-behavioral features of the user during typing, transforming identity verification from a "discrete, single check" to "continuous process monitoring." For example, when the system detects abnormal keystroke rhythms, it can automatically trigger secondary authentication or freeze operations to prevent data tampering in a timely manner. This continuous, stealthy protection fills the gaps in one-time verification.

[0005] Keystroke dynamics is particularly applicable in desktop scenarios. Firstly, desktops typically feature physical keyboards, providing hardware support for keystroke data collection. Compared to virtual keyboards on mobile devices, the key travel and feedback mechanism of physical keyboards can generate more stable and distinctive behavioral characteristics. Secondly, desktops are often used for handling highly sensitive tasks, requiring users to operate continuously for extended periods. The traditional "login equals trust" model is easily exploited by attackers. Keystroke verification can seamlessly integrate into the workflow, completing continuous identity verification without the user's awareness, achieving "zero-disruption" continuous authentication and significantly reducing operation interruption rates.

[0006] With remote work and cross-border collaboration becoming the norm, the value of keystroke dynamics verification is becoming increasingly apparent. For example, when users remotely access servers via the SSH protocol, traditional VPN password verification methods have two problems: first, leaked VPN credentials could expose the entire internal network; second, long-term sessions after legitimate login lack dynamic protection. By introducing keystroke verification, the system can continuously analyze a user's command-line operation habits as they enter commands, automatically terminating the session or issuing an administrator alert if abnormal patterns are detected. This real-time protection mechanism is more adaptable to high-risk remote operating environments than traditional periodic password change strategies.

[0007] Almost all user terminals are equipped with input devices such as keyboards. Utilizing the characteristics of these devices for identity verification not only inherits the convenience and security of biometric authentication but also overcomes problems such as strong device dependence, high privacy concerns, and high costs. More importantly, keystroke behavior authentication supports continuous authentication, meaning that users continuously verify their identity while using the terminal, significantly improving anti-counterfeiting and anti-impersonation capabilities. In conclusion, keystroke behavior-based identity verification technology has unique advantages and broad application prospects.

[0008] Early research on persistent authentication based on keystroke behavior features focused on one-time authentication with fixed text (such as login passwords). While performing well in this scenario, its performance degrades in free text. Current research emphasizes the processing of temporal features and focuses on algorithm selection and improvement. However, these existing solutions still suffer from the following two problems, limiting their practical application: Existing models, such as distance-metric matching algorithms, convolutional neural networks, recurrent neural networks, Siamese neural networks, chaotic neural networks, and long short-term memory networks, cannot effectively extract the keystroke correlation between features and data, thereby extracting the keystroke rhythm; Although existing research has processed temporal features, it still considers too few features, making it prone to overfitting and resulting in insufficient generalization. Summary of the Invention

[0009] When applying existing models to real-world tests with free text, relying solely on keycodes and time information results in insufficient feature quantity, easily leading to overfitting and impacting generalization performance. Furthermore, additional features collected from external devices (such as microphones, cameras, and watches) do not meet practical application requirements in defense systems. While textual semantic information can improve recognition capabilities, in system commands and work scenarios, the semantic information of user input overlaps significantly, making effective differentiation difficult and limiting its effectiveness. Therefore, this paper starts with the keyboard itself, mining its inherent information to achieve feature expansion and behavioral data augmentation.

[0010] In existing technologies, there are certain shortcomings in the feature extraction of user keystroke data in neural network models. The capture of internal correlation information of the data is not sufficient, making it difficult for the model to accurately capture the unique keystroke rhythm of each user. This invention proposes to use feature attention and rhythm attention for multi-scale feature extraction, adaptively focusing on key features, thereby better extracting the user's personalized rhythm information and improving the model's sensitivity to subtle differences.

[0011] To address the problem that existing technologies, which rely on existing features, can easily lead to model overfitting and thus affect generalization performance, this invention proposes keyboard position function encoding to expand features, enhance the behavioral patterns of user keystroke data, further enrich the model's input data, and improve the model's generalization ability.

[0012] To address the issue that existing technologies are susceptible to fluctuations from various factors during use, which not only affect system performance stability but also increase the error rate during verification, this invention proposes a multi-sliding sequence method and a continuous verification algorithm. This method can effectively improve the system's adaptability and accuracy in dynamic environments and enhance its robustness.

[0013] To further define the continuous authentication method based on keystroke behavior characteristics, the specific steps are as follows:

[0014] Step 1: Users need to log in to the system in advance to collect keystroke data. The system obtains basic keystroke data by listening to keyboard keystroke events and capturing the timestamps of the keys.

[0015] Step 2: Vectorize the basic keystroke data, pair the press and release events to form a complete single-key event, and then calculate the various time intervals between adjacent keys to obtain keystroke vector data. The keystroke vector data are then spliced together on the time step according to the original timestamp of the press to form a complete single-key input sequence data.

[0016] Step 3: Enhance the features of single-key input sequence data through keyboard function position encoding;

[0017] The keyboard position function is encoded based on each user's data, including the following fields:

[0018] 1) Position indicator: indicates whether the button has a fixed horizontal and vertical coordinate;

[0019] 2) Frequency of use: The key presses are categorized into levels based on their frequency and importance in input, reflecting the user's proficiency.

[0020] 3) Vertical position: Mark the row where the key is located, starting from the bottom blank row and increasing sequentially;

[0021] 4) Horizontal position: The left and right edges are described by numerical values to indicate the horizontal coordinates of the button;

[0022] 5) Left and right hand area division: The buttons are categorized to distinguish between left and right hand use;

[0023] 6) Word / Sentence Breakpoint End Marker: Marks the key used as a word segmentation breakpoint;

[0024] 7) Function area labeling: Classify the buttons according to their functions.

[0025] Step 4: Divide the keystroke data into three input data sequences of different lengths using the multi-sliding sequence method, and input them into the neural network model respectively;

[0026] The multi-sliding sequence method includes the following three sequences:

[0027] Sequence 1: Divide the single-key input sequence data into data of fixed length L.

[0028] Sequence 2: Based on the user input, extract the word sequence from the single-key input sequence data using a simplified adaptive word segmentation method; the adaptive word segmentation method consists of the following three rules:

[0029] 1) Breakpoint marker: When the input key has a word / phrase breakpoint marker in the keyboard position function encoding, it will be automatically separated.

[0030] 2) Interval time: If the time interval between two keys is too long, the system will automatically identify it as a separation point and perform word segmentation.

[0031] 3) Sequence length: When the length of sequence 2, i.e. the number of keystroke vectors in sequence 2, exceeds the fixed length of sequence 1, the system will automatically divide it into independent subsequences.

[0032] Sequence 3: Divide the single-key input sequence data into individual data represented by a combination of three keys.

[0033] Step 5: The neural network model is trained based on the input data to obtain a user-specific model. The neural network model structure is as follows: First, the mean and standard deviation of the user are used to standardize all input data. Then, the data passes through a feature attention module and a rhythm attention module, followed by multiple fully connected layers and Dropout. Finally, the input data is activated by an activation function to obtain the model output probability.

[0034] The specific steps of feature attention are as follows: First, receive the input data X, and for each pair of adjacent keystroke sequence vectors x... t x t+1 The features are concatenated to form a new feature vector z. t Then, the feature vector z is processed through a fully connected layer. t Project the vector to generate the query vector Q. t Dimensionality reduction is performed; then the feature vector z is... t Multiply by the learnable weight matrix W respectively k and W v Generate key vector K t Sum vector V t Then calculate the query vector Q. t and the transpose of the key vector K t The dot product yields the general attention weight A. t Normalization is performed using Softmax; then the attention weights A are... t With the transpose of the value vector V t Multiply to generate a new feature vector h t Finally, the feature vectors h at each time step are... t The stacks form a new output sequence H.

[0035] The specific steps of rhythmic attention are as follows: First, receive the input data H, and add the positional encoding function PE to the time dimension to obtain the input feature matrix H. input Then, for the input feature matrix H input For each sequence, a fully connected mapping is used to map it to a lower-dimensional space for dimensionality reduction and compression, resulting in a high-dimensional feature representation H. dense For character-level features, use H. input As the input matrix H char For word-level and phrase-level features, one-dimensional convolution with different kernel sizes is used to extract features, resulting in a two-level input matrix H. word H phrase Then multiply the resulting matrices by the learnable weight matrix W. Q W K W V Generate Q-vectors, K-vectors, and V-vectors. The attention weights are calculated using a scaled dot product to obtain the attention output score matrix A1. The resulting attention output score matrix A1 is then...l The output is obtained by multiplying the V vector, and the feature distribution is normalized by layer normalization. Finally, the attention outputs of the three levels of characters, words and phrases are concatenated to obtain the final calculation matrix.

[0036] Step Six: Enter the identity verification stage. Perform data processing operations according to the method of the keystroke data collection stage, including event collection, data vectorization, and feature expansion. Then, use a dedicated model to calculate the model output probability through matrix calculation.

[0037] Step 7: Process the model output probability using a continuous verification algorithm to obtain the final identity verification probability.

[0038] The continuous verification algorithm includes the following three mechanisms:

[0039] 1) Dynamic weighting mechanism: After each keystroke, the neural network model outputs a recognition probability p. w (w is the current time step window), only retain the probability sequence of the most recent W keystrokes {p w−W+1 ,…,p w}, and add an exponentially decaying weight w to the recognition probability at each time point. t This weight decays over time; at the same time, a fault tolerance mechanism for outlier removal is introduced: among W keystroke probabilities, n maximum and minimum values (i.e., outliers) are removed respectively, and a weighted average probability is calculated. ;

[0040] 2) Dynamic fluctuation mechanism: If the identification probability fluctuates beyond a set threshold within a certain period of time, it may indicate that the identity has changed. The following weighted fluctuation values can be used to detect short-term fluctuations in the probability sequence within window W. If the set threshold is exceeded, it indicates that the verification probability fluctuates drastically and there may be an anomaly. An "identity suspicion" flag can be triggered, which can be logged and require further multi-round authentication.

[0041] Attack pattern identification: Based on fluctuation detection, a low-probability warning threshold P is set. min If there is more than half a probability p within window W i <P min This immediately triggers the system's "suspicious behavior" flag, which can be logged and requires further multi-round authentication. This can quickly identify situations where fake intruders enter destructive commands because their keystroke characteristics are significantly different from those of the original user, and the probability of identification will drop rapidly.

[0042] 3) Confidence assessment mechanism:

[0043] a. Mean and standard deviation of the most recently identified probability: Confidence θ is dynamically calculated using data from a sliding window W. t ;

[0044] b. Minimum confidence level requirement: Ensure confidence level θ t Not lower than the minimum confidence level requirement θ set by the system min ;

[0045] The system adaptively adjusts based on the authentication results within the sliding window; when the system's confidence level θ for the user... t When θ is high min It can be appropriately reduced, allowing for some fluctuation; when the confidence level θ t When θ is low min This can be improved, thereby increasing the rigor of the system.

[0046] The advantages of this invention are:

[0047] 1) This invention designs a neural network model structure based on feature attention and rhythm attention. The proposed feature attention can capture the rhythmic correlation hidden in the keystroke sequence, enabling the model to focus on key features and reduce dependence on irrelevant and noisy information, thereby improving the ability to extract complex and unstable user behavior features; the proposed rhythm attention can effectively mine the operating habits in intermittent keystroke data, further improving the verification accuracy.

[0048] 2) This invention innovatively proposes keyboard position function encoding, which enables feature expansion, gets rid of excessive dependence on a small number of features, reduces the risk of overfitting, and enhances the model's generalization ability.

[0049] 3) This invention also proposes a multi-sliding sequence method, which uses a fixed-length sequence 1 for collaborative verification, an adaptive extraction of short text information from sequence 2, and a three-key combination of sequence 3 to quickly capture changing trends and refine the processing of different behavior types.

[0050] 4) This invention proposes a continuous verification algorithm. By introducing time weighting, dynamic fluctuation, and confidence assessment mechanisms, the false rejection rate is reduced, further enhancing the robustness of the system and ensuring efficient and accurate continuous identity verification. Attached Figure Description

[0051] Figure 1 A graph showing the basic keystroke data;

[0052] Figure 2 This is a diagram of the single-key input sequence data of the present invention;

[0053] Figure 3 This is a heatmap of keystroke data statistics for the present invention;

[0054] Figure 4 This is a diagram of the multi-sliding sequence method of the present invention;

[0055] Figure 5This is a diagram of the neural network model structure of the present invention;

[0056] Figure 6 This is a diagram of the feature attention structure of the present invention;

[0057] Figure 7 This is a diagram of the rhythm attention structure of the present invention. Detailed Implementation

[0058] To facilitate understanding and implementation of the present invention by those skilled in the art, the present invention will be further described in detail below with reference to the accompanying drawings and embodiments. Obviously, the described embodiments are merely some, not all, embodiments of the present invention. All other embodiments obtained by those skilled in the art based on the embodiments of the present invention without creative effort should fall within the scope of protection of the present invention.

[0059] This invention proposes a continuous authentication method based on keystroke behavior characteristics, the specific steps of which are as follows:

[0060] Step 1: The system collects user keystroke data. The system obtains basic keystroke data by listening to keyboard keystroke events and capturing the timestamps of the keys, such as... Figure 1 As shown, "KeyPress" is the key press event, and "KeyRelease" is the key release event. Each event has a corresponding timestamp.

[0061] Step 2: Vectorize the basic keystroke data, pairing press and release events to form complete single-key events, and then calculate various time intervals between adjacent keys to obtain keystroke vector data, such as... Figure 1 As shown, "P" is the timestamp of the key being pressed, "R" is the timestamp of the key being released, "H" represents the holding time between pressing and releasing the same key, "P[1]P[2]" represents the interval between pressing the first key and pressing the second key, "R[1]P[2]" represents the interval between releasing the first key and pressing the second key, "P[1]R[2]" represents the interval between pressing the first key and releasing the second key, and "R[1]R[2]" represents the interval between releasing the first key and releasing the second key. Figure 2 As shown, keystroke vector data are concatenated at time steps according to the original timestamps of the keystrokes to form a complete single-key input sequence data.

[0062] Step 3: Enhance the features of single-key input sequence data by using keyboard function position encoding.

[0063] This study selected the ANSI standard keyboard layout and statistically analyzed the frequency of each key in the main function areas of the dataset, such as... Figure 3As shown, keyboard position functions are encoded based on each user's data to represent the different meanings of each key in each user's actual model usage. Many keys, especially word combinations, will form muscle memory, and will be completed fluently in the form of words or even sentences. The higher the frequency and the more proficient the user is, the smaller the keystroke error and the higher the confidence level of the judgment.

[0064] This encoding structure comprehensively describes key attributes through multiple fields:

[0065] 1) Position indicator: indicates whether the button has fixed horizontal and vertical coordinates ("1" for fixed, "0" for not fixed).

[0066] 2) Frequency of Use Indicators: Based on the frequency and importance of key presses during input, five levels are used to reflect the user's proficiency. For example, "4" represents frequently used high-frequency keys (such as letters, Enter, and Space); "3" represents navigation keys, shortcut keys, and commonly used symbols; "2" represents other function symbols; "1" corresponds to the numeric keypad; and "0" indicates that the key is not used.

[0067] 3) Vertical position: Use values from 0 to 4 to mark the row where the key is located, increasing sequentially starting from the bottom empty row.

[0068] 4) Horizontal position: The left and right edges are described with values from 0 to 270 to describe the horizontal coordinates of the buttons, ensuring the consistency of button width and spacing.

[0069] 5) Left and right hand area division: The keys are classified and distinguished for left and right hand use (for example, "1" is the left hand area, "2" is the right hand area, "3" represents the space bar, and "0" represents none).

[0070] 6) Word / sentence breakpoint end marker: Mark the key used as the word segmentation breakpoint (such as semicolon, comma, period, question mark, Enter, space, Esc, represented by "1", otherwise "0").

[0071] 7) Function area marking: Classify the keys by function, such as "5" for letter keys, "4" for shortcut combination keys, "3" for symbol keys, "2" for single function keys, "1" for number keys, and "0" for none.

[0072] Step 4: Divide the keystroke data into three input data sequences of different lengths using the multi-sliding sequence method, and input them into the neural network model respectively.

[0073] In existing technologies, verification results require a sequence length before being provided. However, in extreme attack scenarios or enterprise operation and maintenance scenarios, attackers may only perform short sequence operations, resulting in a majority of legitimate verifier data remaining in the verification data, causing the model to misclassify it as legitimate. Therefore, this invention proposes a multi-sliding sequence method. For example, when a user inputs the phrase "continuous authentication" via the keyboard, a series of keystrokes is obtained, and the corresponding sequence contents are as follows: Figure 4 As shown.

[0074] Sequence 1: Divide the single-key input sequence data into data of fixed length L.

[0075] Sequence 2: Based on the user input, extract words or pinyin sequences from the single-key input sequence data using a simplified adaptive word segmentation method; the adaptive word segmentation method consists of the following three rules:

[0076] 1) Breakpoint marker: When the input key has a word / phrase breakpoint marker in the keyboard position function encoding, it will be automatically separated.

[0077] 2) Interval time: If the time interval between two keys is too long, the system will automatically identify it as a separation point and perform word segmentation.

[0078] 3) Sequence length: When the length of sequence 2, i.e. the number of keystroke vectors in sequence 2, exceeds the fixed length of sequence 1, the system will automatically divide it into independent subsequences.

[0079] Sequence 3: Divide the single-key input sequence data into individual data represented by a combination of three keys.

[0080] This method combines multiple sequences (fixed length, word sequences, and three-key combinations) and inputs them into a neural network model for parallel collaborative verification. This enables the model to adapt to different keystroke behaviors and respond quickly to attacks or substitutions, detecting anomalies.

[0081] Step 5: Input the user's keystroke data into the neural network model to train the model and obtain a user-specific model. The structure of the neural network model is as follows: Figure 5 As shown: First, the mean and standard deviation of the user are used to standardize all input data. Then, the data passes through a feature attention module and a rhythm attention module, followed by multiple fully connected layers and Dropout. Finally, the output data is activated by an activation function to obtain the model output probability.

[0082] Individual keystroke rhythm is a continuous process. The goal of keystrokes is to combine letters to form words with semantic information. Therefore, the most relevant keystrokes are those that are temporally adjacent to each other. The position of the key affects the duration of each keystroke and the interval between keystrokes. Existing techniques use CNNs for feature extraction, but CNNs have limited receptive fields, only able to highlight certain features and struggling to represent complex relationships between features or the degree of correlation between features. Therefore, this paper designs a feature attention mechanism to capture the feature relationships between a key and its adjacent keys. This mechanism can also fully explore the potential detailed correlations and temporal dynamic weights in keystroke data, enhancing the model's ability to capture features in adjacent sequences. The specific structure of the feature attention mechanism is as follows: Figure 6 As shown, the specific steps are as follows:

[0083] First, receive the input data. Where R represents the data matrix, B represents the batch size, T represents the input sequence length, and F represents the feature dimension. For each pair of adjacent keystroke sequence vectors x in the input... t x t+1 The features are concatenated to form a new feature vector z. t :

[0084] Then, the feature vector z is processed through a fully connected layer. t Project the vector to generate the query vector Q. t It performs comprehensive feature mapping and dimensionality reduction, eliminating redundant features shared between sequences while strengthening the connections between features, where W q b represents the weight. q Let R represent the bias term, R represent the data matrix, F represent the feature dimension, and t represent the time step.

[0085]

[0086] Then the feature vector z t Multiply by the learnable weight matrix W respectively k and W v Generate key vector K t Sum vector V t Where R represents the data matrix, F represents the feature dimension, and t represents the time step:

[0087]

[0088]

[0089] Then calculate the query vector Q t and transpose of the key vector The dot product yields the general attention weight A. tNormalization is performed using Softmax, where R represents the data matrix and F represents the feature dimension. Represents the transpose of a matrix:

[0090]

[0091] Then assign attention weight A t value vector with transpose Multiply to generate a new feature vector h t Where R represents the data matrix and F represents the feature dimension. Represents the transpose of a matrix:

[0092]

[0093] Finally, the feature vectors h at each time step are... t The sequences are stacked to form a new output sequence H, where R is the data matrix, B is the batch size, T represents the length of the input sequence, and F represents the feature dimension. Represents the transpose of a matrix:

[0094]

[0095] The feature attention module enhances the model's expressive power by capturing the dynamic relationships between features at adjacent time steps and generating new time series features.

[0096] During normal input, keystroke sequences and the attention between keystroke sequences and word sequences are often accompanied by pauses and thinking, which weakens contextual relevance. Furthermore, in the presence of outliers, point-based attention mechanisms may adversely affect the overall weights. Therefore, this patent combines convolutional operations to extract keystroke trends over a period of time, thereby capturing overall information at the word level. Simultaneously, another layer focuses on extracting the correlation between keys, particularly in practical applications, by weakening irrelevant discrete values caused by pauses and thinking, and strengthening truly relevant keystroke information. Attention extraction has already been completed at the feature dimension; the current method only requires attention operations at the sequence dimension, effectively reducing computational complexity.

[0097] Psychological research shows that typical pauses occur between words, roughly 6-8 letters in English. Besides pauses, input errors also occur, requiring deletion and undoing. In these cases, the incorrectly typed keys are physically consecutive. By using positional encoding, key similarity can be established, reducing the impact of these errors on learning differences. Establishing different receptive fields allows the attention mechanism to focus on the correct keystroke information in different scenarios and tasks. For example, the error rate is lower in familiar keystroke scenarios but may be higher in unfamiliar transcription tasks. The specific structure of rhythmic attention is as follows: Figure 7 As shown, the specific steps are as follows:

[0098] First, we receive the input data H, and then add the positional encoding function PE to the time dimension to obtain the input feature matrix H. input The positional encoding function can enhance time series information, where t represents the length of each time step, T represents the length of the input sequence, f represents each feature dimension, and F represents the feature dimension.

[0099]

[0100]

[0101] Then, for the input feature matrix H input For each sequence, a fully connected mapping is used to map it to a lower-dimensional space for dimensionality reduction and compression, resulting in a high-dimensional feature representation H. dense W d b represents the weight. d The bias term is represented by t, the time step is represented by t, and the length of the input sequence is represented by T.

[0102]

[0103] For character-level features, use H input As the input matrix H char For word-level and phrase-level features, one-dimensional convolution with different kernel sizes is used to extract features, resulting in a two-level input matrix H. word H phrase Where l represents different levels of features, k represents the kernel size, and s represents the kernel stride:

[0104]

[0105] The resulting matrices are then multiplied by the learnable weight matrix W. Q W K W V The Q-vector, K-vector, and V-vector are generated, and the attention weights are calculated to obtain the attention output score matrix A through scaling dot products. l This step of the calculation, because the feature matrix is compressed into a single descriptor, involves calculating the attention matrix as a one-dimensional vector, resulting in a time complexity of O(n). 2 The quadratic complexity of ) is reduced to linear, where l represents different levels of features and d represents variance:

[0106]

[0107] The obtained attention output score matrix A l Multiplying by the vector V yields the output O.l Attention outputs Z at different levels are obtained by normalizing the feature distribution through layer normalization. l Where l represents different levels of features, and LayerNorm represents layer normalization;

[0108]

[0109]

[0110] Finally, the attention outputs at the character, word, and phrase levels are concatenated to obtain the final computation matrix H. concat , where Concat represents the concatenation operation.

[0111]

[0112] Step Six: Enter the identity verification stage. Perform data processing operations according to the method of the keystroke data collection stage, including event collection, data vectorization, and feature expansion. Then, use a dedicated model to calculate the model output probability through matrix calculation.

[0113] Step 7: Process the model output probability using a continuous verification algorithm to obtain the final identity verification probability.

[0114] The continuous verification algorithm includes the following three mechanisms:

[0115] 1) Dynamic weighting mechanism: After each keystroke, the neural network model outputs a recognition probability p. w (w is the current time step window), only retain the probability sequence of the most recent W keystrokes {p w−W+1 ,…,p w}, and add an exponentially decaying weight t to the recognition probability at each time point. w This weight decays over time, where α is the decay factor (0 < α < 1), W is the latest time step, and w is the window size for each time step. Thus, the most recent recognition probability will have a higher weight.

[0116]

[0117] To address the impact of random fluctuations, an outlier removal mechanism is introduced: From W keystroke probabilities, n maximum and minimum values (i.e., outliers) are removed, and a weighted average probability is calculated. :

[0118]

[0119] 2) Dynamic Fluctuation Mechanism: To better adapt to natural fluctuations in identity authentication, a dynamic anomaly detection mechanism is introduced. If the recognition probability fluctuates beyond a set threshold within a certain period, it may indicate a change in identity. The following weighted fluctuation values can be used. Short-term fluctuation detection is performed on the probability sequence within the window, where W represents the window size, i represents each window, t represents the exponentially decaying weight, and p represents the recognition probability.

[0120]

[0121] If the threshold Δσ is exceeded, it indicates that the verification probability is fluctuating drastically and there may be an anomaly. This can trigger an "identity suspicion" flag, which can be logged and require further multi-round authentication.

[0122] Attack pattern identification: Based on fluctuation detection, a low-probability warning threshold P is set. min For example, 0.3, if there is more than half the probability p within window W. i <P min This immediately triggers the system's "suspicious behavior" flag, which can be logged and requires further multi-round authentication.

[0123] This can quickly identify situations where intruders are posing as attackers and entering destructive commands, because their keystroke characteristics differ significantly from those of the original user, and the probability of identification drops rapidly.

[0124] 3) Confidence assessment mechanism: To balance real-time performance and fault tolerance, and to avoid the system mistakenly rejecting or over-trusting users in certain situations, a dynamic threshold can be designed, and the confidence score θ can be dynamically calculated for each window. t Based on the following two points:

[0125] a. Mean and standard deviation of the most recently identified probabilities: Confidence θ is dynamically calculated using data from a sliding window W. t ,in It is a weighted average within the window. It is the standard deviation, and k is a control parameter (e.g., k=3):

[0126]

[0127] b. Minimum confidence level requirement: Ensure confidence level θ t Not lower than the minimum confidence level requirement θ set by the system min .

[0128] Adaptive adjustment based on authentication results within the sliding window: when the system's confidence level θ for the user... t When θ is high minIt can be appropriately reduced, allowing for some fluctuation; when the confidence level θ t When θ is low min This can be improved, thereby increasing the rigor of the system.

[0129] This continuous verification algorithm combines multiple factors to construct a dynamically adjustable verification algorithm. It not only considers the real-time characteristics of keystroke data but also adapts to changes in time and the dynamics of user behavior. It can simultaneously meet different application scenarios, reduce false alarms during long-term detection of a single user, and provide early warnings in the event of sudden identity changes, balancing real-time performance and fault tolerance.

[0130] Experimental setup:

[0131] This experiment uses Python as the programming language for deep learning models and data preprocessing, and the Keras deep learning framework as the tool for model building and some data preprocessing operations.

[0132] To ensure the reproducibility and stability of experimental results, this invention uses a fixed random seed and divides the keystroke sequence data into training, validation, and test sets in a 7:1:2 ratio. For model initialization, the initial learning rate is set to 0.1 (decreasing strategy), using the AdamW optimizer with L2 regularization, binary cross-entropy loss, and an early stopping callback function. In all experiments, each model is trained three times, and the average validation result is used to reduce the impact of randomness. On the test set, a threshold of 0.5 is used to determine the recognition probability, the sequence length is set to 30, and the EER is calculated. The experiments of this invention compare various methods on the Buffalo and Clarkson II datasets (see Tables 1 and 2): SVM, Gunetti & Picardi distance metric, KDE, CNN+RNN, TypeNet, CKDAN, and the TKCA method, which was only tested on the Clarkson II dataset.

[0133] The results indicate that:

[0134] The TypeNet method performed the worst on both datasets. Its LSTM model and ternary loss function failed to effectively capture the complexity of keystroke behavior, resulting in poor performance.

[0135] The limitation of the SVM method is that it cannot effectively represent all possible keystroke combination behavior features, and its performance on the two datasets is relatively poor.

[0136] The Gunetti & Picardi method is more effective in fixed text with more concentrated features, but performs worse in free text due to greater keystroke variation.

[0137] The KDE method has limited applicability and poor performance.

[0138] The TKCA method considers the diversity of inputs in unconstrained environments, but its performance is poor due to its weak ability to express behavioral features. CNN+RNN has shortcomings in extracting the association between adjacent features and has difficulty identifying the correlation between distant keystrokes in long sequence keystroke data, affecting the accuracy of rhythm recognition.

[0139] CKDAN, by introducing a large language model to analyze the input information within the sequence, is currently the best research.

[0140] In contrast, the model proposed in this invention captures the correlation between keystrokes by introducing rhythmic attention and feature attention, and performs word and phrase separation at the text level. It also combines keyboard encoding for feature augmentation. The results are calculated in parallel by multiple algorithms. Through multi-level learning, the model fully explores the user's keystroke rhythm, enabling it to outperform all existing methods and reach the current state of best performance. The EER on the Buffalo and Clarkson II datasets are 1.58% and 4.23%, respectively.

[0141] Table 1. Comparison of existing research (Buffalo dataset)

[0142] algorithm EER SVM 4.93% Gunetti & Picardi 3.75% KDE 1.95% CNN+RNN 2.67% TypeNet 7.6% CKDAN 1.67% This invention 1.58%

[0143] Table 2. Comparison of existing studies (Clarkson II dataset)

[0144] algorithm EER SVM 15.67% Gunetti & Picardi 10.36% KDE 7.59% TKCA 8.28% CNN+RNN 5.97% TypeNet 17.2% CDKAN 4.41% This invention 4.23%

Claims

1. A keystroke-based persistent authentication method based on keyboard position function encoding and attention, characterized in that, The specific steps are as follows: Step 1: The user logs into the system to collect keystroke data. The system obtains basic keystroke data by listening to keyboard keystroke events and capturing the timestamps of the keys. Step 2: Vectorize the basic keystroke data, pair the press and release events to form a complete single-key event, and then calculate the various time intervals between adjacent keys to obtain keystroke vector data. The keystroke vector data are then spliced together on the time step according to the original timestamp of the press to form a complete single-key input sequence data. Step 3: Enhance the features of single-key input sequence data through keyboard function position encoding; The keyboard position function is encoded based on each user's data, including the following 7 fields: Location markers: indicate whether the button has fixed horizontal and vertical coordinates; Frequency of use indicator: The key is divided into levels based on the frequency and importance of key presses during input, reflecting the user's proficiency. Vertical position: Marks the row where the key is located, increasing sequentially starting from the bottom row of empty spaces; Horizontal position: The left and right edges are described by numerical values to indicate the horizontal coordinates of the button; Left and right hand area division: The buttons are categorized and distinguished for use by the left and right hands; Word / sentence breakpoint end marker: Marks the key used as a word segmentation breakpoint; Function area labeling: categorizing buttons by function; Step 4: Divide the keystroke data into three input data sequences of different lengths using the multi-sliding sequence method, and input each sequence into the neural network model. The multi-sliding sequence method includes the following three steps: Step 4.1: Divide the single-key input sequence data into data components of fixed length L to generate sequence 1; Step 4.2: Based on the user input, extract the word sequence from the single-key input sequence data using an adaptive word segmentation method to generate Sequence 2; Step 4.3: Divide the single-key input sequence data into individual data represented by three-key combinations to generate sequence 3; Step 5: The neural network model is trained based on the input data to obtain a user-specific model. The structure of the neural network model is as follows: Step 5.1: Standardize all input data using the user's mean and standard deviation. Step 5.2: Input the output data of the standardized operation into the feature attention module; Step 5.3: The output data processed by the feature attention module is input into the rhythm attention module. The specific steps of the rhythm attention module are as follows: Step 5.3.1: Receive input data H, and add the position encoding function PE to the time dimension to obtain the input feature matrix H. input ; Step 5.3.2: Process the input feature matrix H input For each sequence, a fully connected mapping is used to map it to a lower-dimensional space for dimensionality reduction and compression, resulting in a high-dimensional feature representation H. dense ; Step 5.3.3: For character-level features, use H input As the input matrix H char For word-level and phrase-level features, one-dimensional convolution with different kernel sizes is used to extract features, resulting in a two-level input matrix H. word H phrase ; Step 5.3.4: Multiply the obtained matrices by the learnable weight matrix W. Q W K W V The Q-vector, K-vector, and V-vector are generated. The attention weights are calculated by scaling the dot product to obtain the attention output score matrix A. l ; Step 5.3.5: The final attention output score matrix A l Multiplying by the vector V yields the output O. l The feature distribution is standardized through layer normalization; Step 5.3.6: Then, concatenate the attention outputs at the character level, word level, and phrase level to obtain the final computation matrix H. concat ; Step 5.4: The output data processed by the rhythm attention module is then processed through multiple layers of fully connected layers and Dropout. Step 5.5: The obtained output data is then processed through an activation function to obtain the model output probability; Step Six: Enter the identity verification stage. Perform data processing operations according to the method of the keystroke data collection stage, including event collection, data vectorization, and feature expansion. Then, use a dedicated model to calculate the model output probability through matrix calculation. Step 7: Process the model output probability using a continuous verification algorithm to obtain the final identity verification probability; The continuous verification algorithm includes the following three mechanisms: 1) Dynamic weighting mechanism: After each keystroke, the neural network model outputs a recognition probability p. w (w is the current time step window), only retain the probability sequence of the most recent W keystrokes {p w−W+1 ,…,p w }, and add an exponentially decaying weight w to the recognition probability at each time point. t This weight decays over time; at the same time, a fault tolerance mechanism for outlier removal is introduced: among W keystroke probabilities, n maximum and minimum values (i.e., outliers) are removed respectively, and a weighted average probability is calculated. ; 2) Dynamic fluctuation mechanism: If the identification probability fluctuates beyond a set threshold within a certain period of time, it indicates that the identity has changed; The following weighted fluctuation values are used to perform short-term fluctuation detection on the probability sequence within window W. If the set threshold is exceeded, it indicates that the verification probability fluctuates drastically and an anomaly has occurred. This triggers an "identity suspicion" flag, logs the event, and requires further multi-round authentication. Attack pattern identification: Based on fluctuation detection, a low-probability warning threshold P is set. min If there is more than half a probability p within window W i <P min This immediately triggers the system's "suspicious behavior" flag, logs the event, and requires further multi-round authentication to quickly identify situations where fake intruders input destructive commands. Because their keystroke characteristics differ significantly from those of the original user, the probability of identification drops rapidly. 3) Confidence assessment mechanism: a. Mean and standard deviation of the most recently identified probability: Confidence θ is dynamically calculated using data from a sliding window W. t ; b. Minimum confidence level requirement: Ensure confidence level θ t Not lower than the minimum confidence level requirement θ set by the system min ; Adaptive adjustment based on authentication results within the sliding window; When the system has a confidence level θ for the user t When θ is high min Decrease; when confidence level θ t When θ is low min This improves the rigor of the system.

2. The keystroke persistence authentication method based on keyboard position function encoding and attention as described in claim 1, characterized in that, In step four, the specific steps of the multi-sliding sequence method are as follows: Step 4.1: Divide the single-key input sequence data into data components of fixed length L to generate sequence 1; Step 4.2: Based on the user input, extract the word sequence using an adaptive word segmentation method, dividing the single-key input sequence data into generated sequence 2; the adaptive word segmentation method consists of the following three rules: Breakpoint markers: When the input key has a breakpoint marker in the keyboard position function encoding, it will be automatically separated; Interval time: If the time interval between two keys exceeds the threshold, the system will automatically identify it as a separation point and perform word segmentation processing; Sequence length: When the length of sequence 2, i.e. the number of keystroke vectors in sequence 2, exceeds the fixed length of sequence 1, the system will automatically divide it into independent subsequences; Step 4.3: Divide the single-key input sequence data into individual data composed of three-key combinations to generate sequence 3.

3. The keystroke persistence authentication method based on keyboard position function encoding and attention as described in claim 1, characterized in that, In step five, the specific steps of the feature attention module are as follows: Step 5.2.1: Receive input data X, and for each pair of adjacent keystroke sequence vectors x... t、 x t+1 The features are concatenated to form a new feature vector z. t ; Step 5.2.2: Then, process the feature vector z through a fully connected layer. t Project the vector to generate the query vector Q. t Dimensionality reduction; Step 5.2.3: Transfer the feature vector z t Multiply by the learnable weight matrix W respectively v1 and W k1 Generate key vector K t Sum vector V t ; Step 5.2.4: Calculate the query vector Q t and the transpose of the key vector K t The dot product yields the general attention weight A. t Normalization is performed using Softmax. Step 5.2.5: Set the attention weight A t With the transpose of the value vector V t Multiply to generate a new feature vector h t ; Step 5.2.6: Convert the feature vector h of each time step t The stacks form a new output sequence H.