Mobile game user behavior analysis method and system based on big data analysis
By constructing a behavioral sample library, a statistical behavioral baseline, and a deep pattern library, and combining them with Bayesian fusion methods, the accuracy and real-time performance issues of user behavior analysis in existing technologies have been resolved, thereby improving the accuracy and real-time performance of user behavior analysis in mobile games.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- WUHAN AIYA TECHNOLOGY CO LTD
- Filing Date
- 2026-04-01
- Publication Date
- 2026-06-30
AI Technical Summary
Existing mobile game user behavior analysis methods are insufficient to accurately assess users' actual operational proficiency, fail to consider the temporal correlation and synchronicity of touch operations, and lack in-depth mining of massive amounts of historical user behavior data, resulting in significant biases in the analysis results.
We collect historical touch operation data from a massive amount of mobile game users, perform time-series slicing and event alignment processing, and construct a behavior sample library. Based on the behavior sample library, we statistically analyze the distribution parameters of multi-finger operation synchronization indicators under each proficiency level to construct a statistical behavior baseline. We cluster operation patterns, extract typical operation pattern prototypes, and construct a deep behavior pattern library. We collect current user data in real time for deviation comparison and similarity matching, and obtain operation proficiency level and focus score through Bayesian fusion method.
It enables comprehensive evaluation of user behavior from multiple perspectives, improving the accuracy and real-time nature of the analysis, and providing reliable technical support for personalized game recommendations and anti-cheating detection.
Smart Images

Figure CN122309997A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of big data analytics, and in particular to a method and system for analyzing user behavior in mobile games based on big data analytics. Background Technology
[0002] With the rapid development of mobile internet technology, mobile games have become an important form of daily entertainment for people. The massive amount of behavioral data from game users contains rich user characteristic information. Accurate analysis of this behavioral data is of great significance for game developers to optimize game design, improve user experience, and achieve personalized operation.
[0003] Currently, existing mobile game user behavior analysis methods mostly rely on macro-level data such as user game rank and win rate, or simply statistically analyze basic indicators such as the number and duration of user touch operations. These methods struggle to accurately assess users' actual operational proficiency and fail to consider the temporal correlation and synchronicity of touch operations, making it impossible to capture differences in user operation patterns and distinguish the operational characteristics of users with different skill levels. Furthermore, existing methods lack in-depth mining of massive amounts of historical user behavior data, resulting in a lack of reference for current user behavior analysis and significant bias in the analysis results. Summary of the Invention
[0004] This invention addresses the technical problems of insufficient micro-operation feature mining, inaccurate behavior evaluation, and inadequate indicator integration in existing mobile game user behavior analysis methods, and provides a mobile game user behavior analysis method and system based on big data analysis.
[0005] The technical solution of the present invention to solve the above-mentioned technical problems is as follows: In a first aspect, the present invention provides a method for analyzing mobile game user behavior based on big data analysis, including: Collect massive amounts of historical touch operation data from mobile game users, and perform time-series slicing and event alignment processing on the historical touch operation data to construct a behavior sample library; Based on the aforementioned behavioral sample library, the distribution parameters of the multi-finger operation synchronization index under each proficiency level were statistically analyzed, and a statistical behavioral baseline was constructed. The operation patterns in the behavior sample library are clustered to extract typical operation pattern prototypes, and a deep behavior pattern library is constructed based on the typical operation pattern prototypes. Real-time collection of the current user's touch operation data generates the current time segment. The current time segment is compared with the statistical behavior baseline and matched with the deep behavior pattern library to obtain the deviation comparison result and the similarity matching result. The current time segment is input into a pre-trained time prediction model, and the predicted values of proficiency level and focus score are output. The deviation comparison result, the similarity matching result, the proficiency level prediction value, and the focus score prediction value are fused using a Bayesian fusion method to obtain the current user's operation proficiency level and focus score.
[0006] Secondly, this invention provides a mobile game user behavior analysis system based on big data analysis, comprising: The behavior sample construction module is used to collect historical touch operation data of massive mobile game users, and to perform time-series slicing and event alignment processing on the historical touch operation data to construct a behavior sample library; The statistical behavior baseline construction module is used to statistically analyze the distribution parameters of the multi-finger operation synchronization index under each proficiency level based on the behavior sample library, and construct the statistical behavior baseline. The deep behavior pattern library construction module is used to cluster the operation patterns in the behavior sample library, extract typical operation pattern prototypes, and construct a deep behavior pattern library based on the typical operation pattern prototypes. The real-time user behavior analysis module is used to collect the current user's touch operation data in real time, generate the current time segment, compare the deviation of the current time segment with the statistical behavior baseline, and perform similarity matching with the deep behavior pattern library to obtain the deviation comparison result and the similarity matching result. The time-series prediction module is used to input the current time-series segment into a pre-trained time-series prediction model and output the predicted values of proficiency level and focus score. The fusion analysis module is used to fuse the deviation comparison result, the similarity matching result, the proficiency level prediction value, and the focus score prediction value using a Bayesian fusion method to obtain the current user's operation proficiency level and focus score.
[0007] The beneficial effects of this invention are: Compared to existing technologies, this application first constructs a behavioral sample library containing touch-event aligned sequences and labels by collecting massive amounts of historical touch operation data and performing time-series slicing and event alignment. This provides high-quality foundational data for subsequent statistical analysis, pattern clustering, and predictive model training. Secondly, based on the behavioral sample library, the distribution parameters of multi-finger operation synchronization indicators under each proficiency level are statistically analyzed to construct a statistical behavioral baseline, achieving a quantitative benchmark for user operation stability and proficiency. Thirdly, typical operation pattern prototypes are extracted by clustering operation patterns, constructing a deep behavioral pattern library. This abstracts discrete operation data into interpretable pattern prototypes, providing a structured reference for real-time behavior matching. Then, current user data is collected in real-time, and the current time-series segment is generated. Deviation from the statistical behavioral baseline and similarity matching with the deep behavioral pattern library are performed to obtain multi-dimensional behavioral features. Simultaneously, a pre-trained time-series prediction model outputs predicted proficiency levels and attention scores, using time-series information to predict user states. Finally, the multi-source evidence was fused using a Bayesian fusion method to obtain the current user's operational proficiency level and focus score, thus achieving a comprehensive multi-dimensional evaluation of user behavior.
[0008] Through the above technical solutions, this application constructs a complete analysis chain from historical data mining and behavioral pattern extraction to real-time multi-source evidence fusion, effectively solving the problems of coarse granularity, poor real-time performance, and difficulty in quantifying focus in existing methods. It improves the accuracy and real-time performance of mobile game user behavior analysis and provides reliable technical support for personalized game recommendations, skill assessment, and anti-cheating detection. Attached Figure Description
[0009] Figure 1 A flowchart illustrating the mobile game user behavior analysis method based on big data analysis provided by this invention; Figure 2 This is a schematic diagram of the structure of the mobile game user behavior analysis system based on big data analysis provided by the present invention.
[0010] In the attached diagram, the components represented by each number are as follows: Behavioral sample construction module 11, statistical behavioral baseline construction module 12, deep behavioral pattern library construction module 13, real-time user behavior analysis module 14, time series prediction module 15, and fusion analysis module 16. Detailed Implementation
[0011] The technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.
[0012] In the description of this invention, the terms "first" and "second" are used for descriptive purposes only and should not be construed as indicating or implying relative importance or implicitly specifying the number of indicated technical features. Thus, a feature defined as "first" or "second" may explicitly or implicitly include one or more of the stated features. In the description of this invention, "a plurality of" means two or more, unless otherwise explicitly specified.
[0013] In the description of this invention, the term "for example" is used to mean "used as an example, illustration, or description." Any embodiment described as "for example" in this invention is not necessarily to be construed as being more preferred or advantageous than other embodiments. The following description is provided to enable any person skilled in the art to make and use the invention. Details are set forth in the following description for purposes of explanation. It should be understood that those skilled in the art will recognize that the invention can be made without using these specific details. In other instances, well-known structures and processes will not be described in detail to avoid obscuring the description of the invention with unnecessary detail. Therefore, the invention is not intended to be limited to the embodiments shown, but is consistent with the broadest scope of the principles and features disclosed herein.
[0014] Example 1, as Figure 1 As shown, this embodiment of the invention provides a method for analyzing mobile game user behavior based on big data analysis, including: S10: Collect a large amount of historical touch operation data from mobile game users, and perform time-series slicing and event alignment processing on the historical touch operation data to build a behavior sample library.
[0015] Mobile game user behavior analysis requires a large amount of labeled data to train models and establish behavioral baselines. However, raw touch data is a continuous, unstructured stream of events, and the operation patterns vary greatly among different users and in different game contexts. If raw data is used directly, it is difficult to establish a correlation between touch operations and game results or user attention.
[0016] At the same time, there is a time delay between user actions and in-game events. If they are not aligned, it will lead to a misalignment of cause and effect between actions and events, affecting the accuracy of subsequent analysis.
[0017] To address the aforementioned issues, this application collects a massive amount of historical touch operation data from mobile game users, and performs time-series slicing and event alignment processing on the historical touch operation data to construct a behavior sample library.
[0018] Specifically, step S10 in the method includes: Collect massive amounts of raw touch event streams generated by mobile game users during historical gameplay. The raw touch event streams include at least the touch point coordinates, pressure value, and touch start and end timestamps for each touch event. Collect in-game event tags and user-rated focus tags that are synchronized in time with the original touch event stream, wherein the in-game event tags include at least victory event tags, defeat event tags, and level completion event tags; The original touch event stream is time-sliced according to a preset fixed duration window to obtain multiple time segments; For each time segment, perform a time-series correlation analysis between the touch event sequence within that time segment and the in-game event tag sequence within the same time window to obtain the touch-event alignment sequence. The touch-event alignment sequence, along with the in-game event tags and user-rated focus tags associated with that timing segment, are used as a behavioral sample. The collection of all behavioral samples is then used to construct a behavioral sample library.
[0019] In this embodiment, a massive amount of raw touch event streams generated by mobile game users during historical gameplay are first collected. These raw touch event streams include at least the touch point coordinates, pressure value, and touch start / end timestamps for each touch event. The raw touch event stream refers to the continuous sequence of all touch events generated by the user through the mobile phone touchscreen during gameplay, representing the raw data reflecting the user's actions. Touch point coordinates refer to the specific location of the touch operation on the mobile phone screen, such as x-axis and y-axis coordinates, reflecting the user's preferred operation position. Pressure value refers to the amount of pressure applied when the user touches the screen, reflecting the force characteristics of the user's operation. Touch start / end timestamps refer to the start and end times of the touch operation, reflecting the duration and temporal characteristics of the user's operation.
[0020] For example, historical game data of 10,000 mobile game users with different skill levels are collected. Each user's original touch event stream includes the (x,y) coordinates of each touch, the pressure level (e.g., 0-100), the start timestamp, and the end timestamp.
[0021] Secondly, in-game event tags and user-rated focus tags, synchronized in time with the original touch event stream, are collected. In-game event tags include at least victory, defeat, and level completion tags. Time synchronization means that the timestamps of the in-game event tags and user-rated focus tags correspond to the timestamps of the original touch event stream, ensuring a temporal correlation among the three. In-game event tags are markers of key events occurring during game execution, reflecting the relationship between user actions and game outcomes. User-rated focus tags are quantitative scores given by users after the game to assess their own focus during gameplay, serving as supervisory tags for subsequent model training. Thus, by collecting these two types of tags, a correlation can be established between touch operations, game events, and focus, providing supervisory data for subsequent pattern discovery and model training.
[0022] For example, when a user completes a game level, a level completion event tag is generated, with the timestamp matching the completion time; after the game ends, the user rates their focus at 8 points, generating a focus tag and associating it with the touch event stream of the corresponding game period.
[0023] Next, the original touch event stream is time-sliced according to a preset fixed-length window to obtain multiple time segments. The preset fixed-length window refers to the time length used to segment the original touch event stream, set according to the game type and operation characteristics, such as 10s, 20s, 30s, etc. The selection of the fixed duration must balance the completeness of operation characteristics and analysis efficiency: a window that is too short may result in the truncation of individual operations, failing to fully capture continuous operation patterns such as multi-finger collaboration; a window that is too long will contain too much redundant information in the time segment, increasing computational load and diluting the representativeness of operation characteristics. Those skilled in the art can set the duration according to the operation rhythm and typical action duration of the specific game.
[0024] For example, for action-intensive real-time battle games, the typical duration of a series of consecutive actions is about 5 to 10 seconds, and the preset fixed duration window can be set to 10 seconds; for slower-paced strategy games, the typical duration of an action sequence is about 15 to 20 seconds, and the preset fixed duration window can be set to 20 seconds.
[0025] For example, if the preset fixed duration window is 20 seconds, a 120-second original touch event stream is divided into 6 time segments, each segment containing all touch events within 20 seconds.
[0026] Furthermore, for each time segment, a temporal correlation analysis is performed between the touch event sequence within that time segment and the in-game event tag sequence within the same time window to obtain a touch-event alignment sequence. Here, the touch event sequence refers to the sequence formed by arranging all touch events within the time segment in chronological order; the in-game event tag sequence refers to the sequence formed by arranging all in-game event tags within the same time window in chronological order; temporal correlation analysis involves mining the temporal correlation between touch events and in-game events, determining the response delay between them, and achieving temporal alignment; the touch-event alignment sequence refers to the sequence where, after alignment processing, touch events and their corresponding in-game events are synchronized on the timeline.
[0027] Finally, the touch-event alignment sequence, along with the associated in-game event tags and user-reported attention tags, is used as a behavioral sample. This collection of all behavioral samples forms the behavioral sample library. Each behavioral sample contains touch operations, corresponding game events, and attention information within a fixed duration, possessing complete "operation-event-attention" correlation characteristics. The behavioral sample library encompasses a vast amount of behavioral samples from different users, game scenarios, and operational states, providing comprehensive and diverse foundational data for subsequent baseline construction, pattern mining, and model training. This completes the construction of the behavioral sample library, ensuring its completeness and diversity, and providing data support for the implementation of subsequent steps.
[0028] Specifically, for each time segment, a temporal correlation analysis is performed between the touch event sequence within that time segment and the in-game event tag sequence within the same time window to obtain a touch-event alignment sequence, including: Extract feature change points from the touch event sequence, wherein the feature change points include at least one of the following: abrupt change points of pressure intensity, extreme acceleration points of touch point coordinates, and points of increase or decrease in the number of touch points; The moment when an in-game event tag appears from nothing or changes from one state to another is taken as the event occurrence time. Calculate the cross-correlation function between the feature change point and the time of the event occurrence, find the offset that maximizes the cross-correlation function value by sliding the time offset, and use the time offset as the response delay between the touch event and the game event; Using the response delay as an alignment offset, the touch event sequence is shifted backward or forward by the response delay to align the touch events with the corresponding game events on the timeline, thus obtaining a touch-event alignment sequence.
[0029] In this embodiment, feature change points are first extracted from the touch event sequence. These feature change points include at least one of the following: abrupt changes in pressure, extreme acceleration points of touch point coordinates, and changes in the number of touch points. Feature change points refer to key time points in the touch event sequence that reflect changes in the user's operational state.
[0030] Specifically, the abrupt change point of pressure intensity refers to the point where the pressure intensity changes significantly in a short period of time, such as suddenly increasing from 20 to 80; the extreme point of acceleration of touch point coordinates refers to the point where the acceleration of touch point movement reaches the maximum or minimum value, reflecting the speed change of user operation; the point of increase or decrease in the number of touch points refers to the point where the number of touch points changes from n to n±1, reflecting the user's switching to multi-finger operation.
[0031] For example, points where the pressure intensity changes abruptly from 30 to 75 within a certain time segment, and points where the acceleration of the touch point coordinates reaches its maximum value, are extracted as feature change points.
[0032] Secondly, the moment when an in-game event tag appears or changes from one state to another is defined as the event occurrence time. Specifically, "appearing from none" refers to the point in time when an event tag such as "victory," "defeat," or "level completion" appears in a state where there were no event tags initially (e.g., changing from no tag to a "level completion" tag); "changing from one state to another" refers to the point in time when an event tag switches (e.g., changing from a "defeat" tag to a "victory" tag); and "event occurrence time" refers to the specific time when a key in-game event occurs, which is the core time node linking touch operations and game events.
[0033] For example, if a pass label appears at t=10s within a certain time segment, that is, the event changes from no label to pass label, then t=10s is the time when the event occurs.
[0034] Next, the cross-correlation function between the feature change point and the event occurrence time is calculated. The offset that maximizes the cross-correlation function value is found by sliding the time offset, and this time offset is used as the response delay between the touch event and the game event. The cross-correlation function measures the correlation between two time series; a larger cross-correlation function value indicates a stronger correlation between the two time series. The sliding time offset refers to the length of time the touch event sequence is shifted forward or backward relative to the event occurrence time. The response delay refers to the time interval between the user's touch operation and the in-game event's response, reflecting the temporal correlation between the touch operation and the game event.
[0035] For example, the formula for calculating the cross-correlation function between the feature change point and the event occurrence time is as follows: ,in, This is a sequence of characteristic change points (the value is 1 at the characteristic change point and 0 elsewhere). Let τ be the sequence of event occurrence times (value 1 at the event occurrence time, 0 elsewhere), and τ be the sliding time offset. By iterating through a preset time offset range, the cross-correlation function value R(τ) corresponding to each time offset is calculated, and the τ that maximizes R(τ) is taken as the response delay. The determination of the time offset range needs to cover all possible response delay times between the touch operation and the game event, and can be set according to the game type and operation response speed. For example, for action games with high real-time requirements, the response delay is usually in the millisecond to second range, so the range can be set to -5s to 5s; for strategy games, the response delay may be slightly longer, so the range can be appropriately expanded. Those skilled in the art can also analyze the time difference distribution between feature change points and event occurrence times in historical data, and take the interval covering more than 95% of the samples as the time offset range.
[0036] For example, when calculating the cross-correlation function between the feature change point and the event occurrence time (t=10s), the cross-correlation function value is the largest when the sliding time offset is 0.5s, and the response delay is 0.5s.
[0037] Finally, the response delay is used as the alignment offset to shift the touch event sequence backward or forward by the response delay, aligning the touch events with the corresponding game events on the timeline, resulting in a touch-event alignment sequence. Here, the alignment offset is the response delay. If the response delay is positive, it indicates that the touch operation change precedes the game event, and the touch event sequence is shifted backward by the response delay; if the response delay is negative, it indicates that the game event changes before the touch operation change, and the touch event sequence is shifted forward by the response delay. After alignment, the characteristic change points of the touch events and the occurrence time of the game events are synchronized on the timeline, achieving a precise association between touch events and game events.
[0038] For example, if the response delay is 0.5s, the touch event sequence is shifted backward by 0.5s so that the feature change point is aligned with the event occurrence time (t=10s) to obtain the touch-event aligned sequence.
[0039] In summary, compared to existing technologies, this application collects massive amounts of historical touch operation data from mobile game users and performs time-series slicing and event alignment processing on this historical touch operation data to construct a behavior sample library. Thus, by transforming continuous raw touch streams into structured time-series samples labeled with game events and attention levels, high-quality and diverse foundational data is provided for subsequent statistical analysis, pattern clustering, and predictive model training, ensuring the accuracy and generalization ability of behavior analysis.
[0040] S20: Based on the aforementioned behavioral sample library, statistically analyze the distribution parameters of the multi-finger operation synchronization index for each proficiency level, and construct a statistical behavioral baseline.
[0041] Significant differences exist in multi-finger operation coordination among users with varying skill levels—highly skilled players tend to exhibit more stable multi-finger operation patterns, such as smaller press timing differences and more stable touch point spacing, while less skilled players show greater fluctuations. To quantify these differences into a comparable benchmark, it is necessary to statistically analyze the distribution characteristics of multi-finger operation synchronization indicators according to skill level from the behavioral sample database, forming statistical baselines for each level.
[0042] To address the aforementioned issues, this application, based on the aforementioned behavioral sample library, statistically analyzes the distribution parameters of multi-finger operation synchronization indicators for each proficiency level and constructs a statistical behavioral baseline. Thus, the proficiency level can be preliminarily determined by calculating the degree of deviation between the current user's operation and the baseline for each level.
[0043] Specifically, step S20 in the method includes: Extract the user proficiency level corresponding to each behavior sample from the behavior sample library, wherein the user proficiency level is pre-divided into multiple discrete levels based on the user's game rank data, game win rate data, or operation score data; For each user's proficiency level, all behavior samples belonging to that user's proficiency level are selected from the behavior sample library to form a subset of behavior samples corresponding to that user's proficiency level. For each behavior sample in each subset of behavior samples, calculate the multi-finger operation synchronization index of that behavior sample. The multi-finger operation synchronization index includes at least one of the following: the average number of touch points at the same time, the variance of the number of touch points at the same time, the standard deviation of the time difference between multiple fingers pressing down, the average time difference between multiple fingers lifting up, and the stability parameter of the relative distance between touch points. Statistical analysis was performed on the multi-finger operation synchronization index of all behavioral samples under each user proficiency level. The arithmetic mean and standard deviation of each index were calculated, and the arithmetic mean and standard deviation were used as the distribution parameters of the corresponding index under the user proficiency level. The distribution parameters of all user proficiency levels are combined to form a statistical behavioral baseline.
[0044] In this embodiment, the user proficiency level corresponding to each behavior sample is first extracted from the behavior sample library. The user proficiency level is pre-divided into multiple discrete levels based on the user's game rank data, game win rate data, or operation rating data. The user proficiency level is a discrete classification reflecting the user's game operation ability; for example, it can be divided into 5-6 levels: Novice, Beginner, Intermediate, Expert, Master, etc. Game rank data refers to the user's official in-game rank, such as Bronze, Silver, Gold, Platinum, etc. Game win rate data is the ratio of the user's number of game wins to the total number of games played. Operation rating data refers to the in-game rating based on indicators such as the user's operation accuracy and speed. Pre-division means that before constructing the behavior sample library, each user is classified into proficiency levels based on any one or more of the above data and associated with their corresponding behavior sample.
[0045] For example, based on the user's game rank, users are divided into 5 proficiency levels: Novice (Bronze, Silver), Beginner (Gold), Intermediate (Platinum), Expert (Diamond), and Master (King), and the proficiency level label corresponding to each behavioral sample is extracted.
[0046] Secondly, for each user's proficiency level, all behavioral samples belonging to that proficiency level are selected from the behavioral sample library to form a subset of behavioral samples corresponding to that proficiency level. Here, a subset of behavioral samples refers to the collection of all behavioral samples under the same proficiency level, with each subset corresponding to a proficiency level and reflecting the operational characteristics of users at that level. The selection process is achieved by matching the proficiency level labels of the behavioral samples, ensuring the purity of each sample subset.
[0047] For example, all behavior samples with a proficiency level of "expert" are selected to form a subset of behavior samples corresponding to the expert level.
[0048] Next, for each behavior sample in each subset of behaviors, a multi-finger operation synchronization index is calculated. This index includes at least one of the following: the average number of touch points at the same time, the variance of the number of touch points at the same time, the standard deviation of the time difference between multiple finger presses, the average of the time difference between multiple finger lifts, and a stability parameter for the relative distance between touch points. The multi-finger operation synchronization index is a core indicator reflecting a user's multi-finger collaborative operation ability, and this index varies significantly among users with different levels of proficiency. The average number of touch points at the same time reflects the frequency of multi-finger operations; users with higher proficiency typically have a higher average. The variance of the number of touch points at the same time reflects the stability of multi-finger operations; users with higher proficiency typically have a smaller variance. The standard deviation of the time difference between multiple finger presses reflects the synchronicity of finger presses; a smaller standard deviation indicates better synchronicity. The average time difference between multiple finger lifts reflects the synchronicity of finger lifts; a smaller average indicates better synchronicity. The stability parameter for the relative distance between touch points reflects the stability of the touch point position; the closer the parameter is to 1, the better the stability.
[0049] Furthermore, statistical analysis was performed on the multi-finger operation synchronization index of all behavioral samples under each user proficiency level. The arithmetic mean and standard deviation of each index were calculated, and these two values were used as the distribution parameters for the corresponding index under that user proficiency level. Statistical analysis refers to summarizing and calculating the same index for all samples under the same proficiency level. The arithmetic mean reflects the average level of the index for users at that level, and the standard deviation reflects the dispersion of the index for users at that level. Together, they constitute the distribution parameters, which can comprehensively reflect the distribution of operational characteristics of users at that level.
[0050] For example, the standard deviation of the time difference between multiple finger presses in the sample subset of advanced-level behavior is statistically analyzed, and the arithmetic mean is calculated to be 0.1s and the standard deviation is 0.02s. The mean and standard deviation are the distribution parameters of this indicator under the advanced level.
[0051] Finally, the distribution parameters of all user proficiency levels are combined to form a statistical behavior baseline. This baseline is a collection of distribution parameters for all proficiency levels and all multi-finger operation synchronization indicators, comprehensively reflecting the operational characteristics of users with different proficiency levels. Subsequently, the current user's proficiency level can be determined by calculating the deviation between the current user's operational indicators and the distribution parameters for each level.
[0052] In summary, compared to existing technologies, this application, based on the aforementioned behavioral sample library, statistically analyzes the distribution parameters of multi-finger operation synchronization indicators for each proficiency level and constructs a statistical behavioral baseline. Thus, by quantifying the synchronization indicators of users with different proficiency levels into a statistical baseline with distribution parameters, a quantitative comparison benchmark is provided for subsequent real-time assessment of the deviation between the current user's operation and the standard baseline, providing a data-driven basis for proficiency judgment.
[0053] S30: Cluster the operation patterns in the behavior sample library, extract typical operation pattern prototypes, and construct a deep behavior pattern library based on the typical operation pattern prototypes.
[0054] After constructing the behavioral sample library, interpretable operational patterns need to be abstracted from it for subsequent real-time matching and evaluation. However, the large number of behavioral samples and the diversity of operational patterns mean that directly storing all samples leads to low retrieval efficiency and makes it difficult to summarize representative operational features.
[0055] To address the aforementioned issues, this application performs clustering processing on the operation patterns in the behavior sample library, extracts typical operation pattern prototypes, and constructs a deep behavior pattern library based on the typical operation pattern prototypes.
[0056] Specifically, step S30 in the method includes: Extract the touch point coordinate sequence, pressure value sequence, and touch start and end timestamp sequence from each behavior sample. Normalize the touch point coordinate sequence, pressure value sequence, and touch start and end timestamp sequence respectively. Then, concatenate the normalized touch point coordinate sequence, normalized pressure value sequence, and normalized touch start and end timestamp sequence along the time axis to form a multi-dimensional temporal feature vector. The multidimensional temporal feature vector is divided into multiple clusters; For each cluster, the arithmetic mean of all multidimensional temporal feature vectors within the cluster is calculated, and the arithmetic mean is used as the prototype of the typical operation mode of the cluster. Each typical operation pattern prototype is associated with and stored in-game event tags and user self-rated focus tags associated with the behavior samples within the cluster to which the typical operation pattern prototype belongs, thus constructing a deep behavior pattern library.
[0057] In this embodiment, the touch point coordinate sequence, pressure value sequence, and touch start / end timestamp sequence are first extracted from each behavior sample. These sequences are then normalized. Finally, the normalized sequences are concatenated along a time axis to form a multidimensional temporal feature vector. The touch point coordinate sequence, pressure value sequence, and touch start / end timestamp sequence are the core temporal data reflecting user operation characteristics in the behavior sample. Normalization maps the data of each sequence to the same interval, such as 0-1, eliminating the dimensional differences between different dimensions and preventing any single dimension from having an excessive impact on the clustering results. The multidimensional temporal feature vector is a high-dimensional vector formed by concatenating the three normalized temporal sequences along a time axis, comprehensively reflecting the temporal and multidimensional characteristics of user operations.
[0058] For example, extract the touch point coordinate sequence (x1, y1, x2, y2, ...), pressure value sequence (p1, p2, p3, ...), and touch start and end timestamp sequence [(t1...]] of a certain behavior sample. 开始 ,t1 结束 (t2) 开始 ,t2 结束 After being normalized to the 0-1 interval, the vectors are concatenated along the time axis to form a multidimensional temporal feature vector.
[0059] Secondly, the multidimensional time-series feature vectors are divided into multiple clusters. A cluster is a set of similar multidimensional time-series feature vectors, and each cluster corresponds to a typical operation mode. The core of clustering is to group similar vectors into one class based on the similarity of feature vectors. Vectors in different classes have significant differences, thereby uncovering different user operation modes.
[0060] Secondly, for each cluster, the arithmetic mean of all multidimensional temporal feature vectors within the cluster is calculated, and this arithmetic mean is used as the prototype of the typical operation mode for the cluster. The arithmetic mean refers to the average of the corresponding dimensions of all feature vectors within the cluster, representing the core features of all feature vectors in that cluster. The prototype of the typical operation mode is a standardized representative of this type of operation mode, reflecting its typical characteristics. Subsequently, the type of operation mode for the current user can be determined by the similarity between the current user's operation features and the prototype.
[0061] Finally, each typical operation pattern prototype is associated with and stored in conjunction with the in-game event tags and user-rated focus tags of the behavioral samples within its respective cluster, thus constructing a deep behavior pattern library. This associated storage means binding the typical operation pattern prototype with the in-game event tags and user-rated focus tags of all samples within that cluster. This ensures that the deep behavior pattern library not only contains typical operation patterns but also the corresponding game events and focus information. The deep behavior pattern library reflects the correlation between different operation patterns and game results and focus, providing a reference for subsequent similarity matching and focus assessment of current user operation patterns.
[0062] Specifically, the multidimensional temporal feature vector is divided into multiple clusters, including: Calculate the Euclidean distance between each multidimensional temporal feature vector and all other multidimensional temporal feature vectors, and construct a distance matrix; Based on the distance matrix, the local density of each multidimensional temporal feature vector is calculated, wherein the local density is equal to the number of other multidimensional temporal feature vectors whose distance from the multidimensional temporal feature vector is less than a preset truncation distance; Based on the distance matrix, the minimum value of the Euclidean distance between the multidimensional temporal feature vector and all other multidimensional temporal feature vectors with higher local density than the multidimensional temporal feature vector is selected as the minimum distance of each multidimensional temporal feature vector. A decision graph is drawn using the local density of each multidimensional temporal feature vector as the first coordinate and the minimum distance of each multidimensional temporal feature vector as the second coordinate. The multidimensional temporal feature vectors in the decision graph that simultaneously satisfy the conditions of local density being greater than a preset density threshold and minimum distance being greater than a preset distance threshold are identified as cluster center points. For each non-cluster centroid's multidimensional temporal feature vector, the multidimensional temporal feature vector of the non-cluster centroid is assigned to the cluster to which the nearest cluster centroid with higher local density belongs, resulting in multiple clusters.
[0063] In this embodiment, the Euclidean distance between each multidimensional temporal feature vector and all other multidimensional temporal feature vectors is first calculated to construct a distance matrix. The Euclidean distance is the straight-line distance between two multidimensional vectors, used to measure the similarity between two feature vectors; the smaller the Euclidean distance, the more similar the two feature vectors. The distance matrix is an n×n matrix (n is the number of multidimensional temporal feature vectors), where the element in the i-th row and j-th column represents the Euclidean distance between the i-th and j-th feature vectors.
[0064] For example, the formula for calculating Euclidean distance is: In the formula, x=(x1,x2,…,xm) and y=(y1,y2,…,ym) are two m-dimensional eigenvectors.
[0065] For example, suppose two multidimensional time-series feature vectors are x=(0.2,0.5,0.8) and y=(0.3,0.6,0.7), then the Euclidean distance between them is: =0.1732. Similarly, by traversing all multidimensional vectors and calculating the Euclidean distance between every two eigenvectors, a complete distance matrix can be constructed.
[0066] Secondly, based on the distance matrix, the local density of each multidimensional temporal feature vector is calculated. The local density equals the number of other multidimensional temporal feature vectors whose distance to the multidimensional temporal feature vector is less than a preset cutoff distance. The preset cutoff distance is a pre-defined distance threshold used to determine whether two feature vectors are "neighbors," which can be set based on the statistical characteristics of the distance matrix. Optionally, the average distance of the distance matrix can be used as the preset cutoff distance. Local density refers to the number of neighboring feature vectors around each feature vector, reflecting the density of that feature vector in the feature space. The higher the local density, the more similar vectors around that feature vector, and the more likely it is to become a cluster center.
[0067] Next, based on the distance matrix, the minimum Euclidean distance among all multidimensional temporal feature vectors with higher local density than the current multidimensional temporal feature vector is selected as the minimum distance for each multidimensional temporal feature vector. Here, the minimum distance refers to the shortest distance between the current feature vector and all feature vectors with higher local density, reflecting the distance between the current feature vector and the high-density region; if the current feature vector has the highest local density, its minimum distance is the maximum Euclidean distance between that vector and all other vectors.
[0068] Furthermore, a decision graph is plotted using the local density of each multidimensional temporal feature vector as the first coordinate and the minimum distance of each multidimensional temporal feature vector as the second coordinate. The decision graph is a scatter plot with local density on the horizontal axis and minimum distance on the vertical axis, where each scatter point represents a multidimensional temporal feature vector. The decision graph can intuitively display the local density and minimum distance distribution of all feature vectors, facilitating the identification of cluster centers.
[0069] Furthermore, multidimensional temporal feature vectors in the decision graph that simultaneously satisfy a local density greater than a preset density threshold and a minimum distance greater than a preset distance threshold are identified as cluster centers. The preset density threshold and preset distance threshold are pre-set based on the distribution characteristics of the decision graph and are used to filter feature vectors with high local density and large distances from other high-density regions. The cluster center is the core of each cluster and represents the characteristics of that cluster; all subsequent non-cluster center points will be assigned to the nearest cluster center.
[0070] For example, the preset density threshold and preset distance threshold can be determined as follows: First, observe the distribution of local density and minimum distance in the decision graph. Typically, the cluster center point corresponds to the upper right corner region where both local density and minimum distance are relatively large. Sort the local density and minimum distance from largest to smallest, and observe the inflection point of their product. The value corresponding to the inflection point can be used as the threshold; or, based on experience, set the local density threshold to the upper quantile of the local density values of all points, such as the top 10%, and the minimum distance threshold to the median or upper quantile of the minimum distance values of all points.
[0071] For example, in a decision graph, if points with a local density greater than 0.8 and a minimum distance greater than 0.5 are clearly separated from other points, then preset density thresholds of 0.8 and preset distance thresholds of 0.5 can be set respectively. If the distribution in the decision graph is not obvious, the thresholds can be adjusted through multiple experiments to match the clustering results with the actual pattern distribution.
[0072] Finally, for each non-cluster centroid's multidimensional temporal feature vector, the feature vector is assigned to the cluster to which the nearest cluster centroid with higher local density belongs, resulting in multiple clusters. Here, a non-cluster centroid refers to a feature vector that was not identified as a cluster centroid. The assignment rule is to assign non-cluster centroids to the cluster centroid with the "nearest" and "highest local density," ensuring feature consistency within each cluster and avoiding clustering bias caused by points that are close but have low local density.
[0073] In summary, compared to existing technologies, this application clusters the operation patterns in the aforementioned behavior sample library, extracts typical operation pattern prototypes, and constructs a deep behavior pattern library based on these typical operation pattern prototypes. Thus, by using density peak clustering to summarize massive discrete operations into interpretable pattern prototypes, and associating them with game events and attention tags, a structured behavior pattern library is constructed, providing an efficient and interpretable reference for subsequent real-time matching, similarity calculation, and attention assessment.
[0074] S40: Collect the current user's touch operation data in real time, generate the current time segment, compare the deviation of the current time segment with the statistical behavior baseline and the similarity matching with the deep behavior pattern library to obtain the deviation comparison result and the similarity matching result.
[0075] After constructing the statistical behavior baseline and deep behavior pattern library, it is necessary to compare the current user's operational data with these two types of references to obtain multi-dimensional evidence. A single-dimensional feature, such as synchronization indicators or pattern similarity, is insufficient to fully reflect the user's operational status.
[0076] To address the aforementioned issues, this application collects the current user's touch operation data in real time, generates a current time-series segment, and compares the deviation of this current time-series segment with the statistical behavior baseline and performs similarity matching with the deep behavior pattern library to obtain deviation comparison results and similarity matching results. This provides complementary evidence for subsequent multi-source information fusion. Specifically, step S40 in the method includes: Real-time collection of touch operation data generated by the current user during the current game, and time-series slicing and event alignment processing of the touch operation data to generate the current time sequence segment; Calculate the multi-finger operation synchronization index of the current time segment, wherein the multi-finger operation synchronization index includes at least one of the following: the average number of touch points at the same time, the variance of the number of touch points at the same time, the standard deviation of the time difference between multiple fingers pressing down, the average time difference between multiple fingers lifting up, and the stability parameter of the relative distance between touch points. Calculate the Z-score of the multi-finger operation synchronization index of the current time segment relative to the proficiency level of each user in the statistical behavior baseline, and use the calculated Z-score as the deviation comparison result; Extract the multidimensional temporal feature vector of the current time segment; The similarity between the multidimensional temporal feature vector and each typical operation pattern prototype in the deep behavior pattern library is calculated, and the calculated similarity value is used as the similarity matching result.
[0077] In this embodiment, touch operation data generated by the current user during the current game is first collected in real time. The touch operation data is then processed through temporal slicing and event alignment to generate a current temporal segment. Real-time touch operation data refers to data such as touch point coordinates, pressure values, and touch start and end timestamps generated by the current user during the game. The method for temporal slicing and event alignment is consistent with the methods described above, ensuring that the format of the current temporal segment is consistent with the temporal segments in the behavior sample library, facilitating subsequent comparative analysis. In this way, real-time touch data is transformed into a standardized current temporal segment, providing a unified data format for subsequent deviation comparison and similarity matching.
[0078] Secondly, the multi-finger operation synchronization index for the current time segment is calculated. This index includes at least one of the following: the average number of touch points at the same time, the variance of the number of touch points at the same time, the standard deviation of the time difference between multiple fingers pressing down, the average time difference between multiple fingers lifting up, and a stability parameter of the relative distance between touch points. The calculation method for the multi-finger operation synchronization index is consistent with the aforementioned steps, ensuring the consistency and comparability of the calculation results and enabling direct comparison with the indicators in the statistical behavior baseline.
[0079] Next, the Z-score of the multi-finger operation synchronization index of the current time segment relative to the proficiency level of each user in the statistical behavior baseline is calculated, and the calculated Z-score is used as the deviation comparison result. The Z-score is the difference between the current index value and the arithmetic mean of the index at a certain proficiency level, divided by the standard deviation of the index at that proficiency level. The formula is: Z = (X - μ) / σ, where X is the current index value, μ is the arithmetic mean of the index at a certain level, and σ is the standard deviation of the index at a certain level. The smaller the absolute value of the Z-score, the closer the current index is to the baseline index at that level, and the smaller the deviation. The deviation comparison result refers to the set of Z-scores of each synchronization index of the current user relative to all proficiency levels, reflecting the degree of deviation between the current user's operation and that of users at different proficiency levels.
[0080] Furthermore, a multi-dimensional temporal feature vector is extracted from the current time sequence segment. The extraction and normalization methods for the multi-dimensional temporal feature vector are consistent with the methods described in the preceding steps, ensuring that the feature vector format of the current time sequence segment is consistent with the typical operation pattern prototypes in the deep behavior pattern library, which facilitates subsequent similarity calculations.
[0081] Finally, the similarity between the multidimensional temporal feature vector and each typical operation pattern prototype in the deep behavior pattern library is calculated, and the calculated similarity value is used as the similarity matching result. The Euclidean distance is preferred for similarity calculation (refer to the Euclidean distance calculation formula in step S30 above). The smaller the Euclidean distance, the more similar the current feature vector is to the typical operation pattern prototype. The similarity matching result refers to the set of similarity values between the current feature vector and all typical operation pattern prototypes, reflecting the degree of matching between the current user's operation pattern and each typical operation pattern.
[0082] In summary, compared to existing technologies, this application collects the current user's touch operation data in real time, generates a current time-series segment, and compares the deviation of this current time-series segment with the statistical behavior baseline and performs similarity matching with the deep behavior pattern library to obtain deviation comparison results and similarity matching results. Thus, by calculating the Z-score to quantify the degree of deviation between the current operation and each proficiency level, and by calculating the similarity to quantify the degree of matching between the current operation and each typical pattern, complementary quantitative basis is provided for subsequent multi-source evidence fusion, making the evaluation results more comprehensive and reliable.
[0083] S50: Input the current time segment into the pre-trained time prediction model and output the predicted value of proficiency level and focus score.
[0084] In addition to obtaining the current user's operational characteristics through deviation comparison and similarity matching, we can also learn the complex mapping relationship between operation patterns and proficiency and focus from time series data, especially the long-range dependency relationship in the operation sequence, and predict the user's final operation level and focus state in advance from early operations.
[0085] To address the aforementioned issues, this application inputs the current time segment into a pre-trained time prediction model and outputs predicted values for proficiency level and focus score.
[0086] Specifically, step S50 in the method includes: The early operation sequence within a preset time window of each behavior sample is extracted from the behavior sample library as input features. The early operation sequence includes a subsequence of touch point coordinates, a subsequence of pressure value, and a subsequence of touch start and end timestamps within the preset time window. Extract the user proficiency level corresponding to each behavior sample from the behavior sample library as the first output label, and extract the user self-assessment focus score corresponding to each behavior sample as the second output label. An initial temporal prediction model is constructed using a long short-term memory network. The initial temporal prediction model includes an input layer, a temporal feature extraction layer, and an output layer. The output layer includes a classification branch for outputting proficiency level and a regression branch for outputting focus score. Using the input features as model input and the first output label and the second output label as supervision labels, the initial time series prediction model is trained in a supervised manner until the classification loss function of the classification branch and the regression loss function of the regression branch converge, thus obtaining the trained time series prediction model.
[0087] In this embodiment, the early operation sequence within a preset time window of each behavior sample is first extracted from the behavior sample library as input features. The early operation sequence includes a subsequence of touch point coordinates, a subsequence of pressure values, and a subsequence of touch start and end timestamps within the preset time window. The preset time window refers to a fixed-length window in the first half of a behavior sample's time segment; for example, if the time segment is 20 seconds, the preset time window is the first 10 seconds, used to extract early features of user operations. The early operation sequence refers to the touch operation time sequence data within the preset time window, reflecting the user's initial habits and state, and can be used to predict subsequent proficiency and focus. The input features refer to the input data used for model training, derived from the early operation sequence, and possess temporal characteristics.
[0088] Secondly, the user proficiency level corresponding to each behavior sample is extracted from the behavior sample database as the first output label, and the user's self-rated focus score corresponding to each behavior sample is extracted as the second output label. The first output label is the supervision label used by the model to predict the user proficiency level, consistent with the proficiency level mentioned in the previous step; the second output label is the supervision label used by the model to predict the user focus score, consistent with the user's self-rated focus label mentioned in the previous step. The two output labels correspond to the two prediction tasks of the model, achieving simultaneous prediction of proficiency level and focus score. Specifically, extracting the supervision labels for model training provides a basis for supervised training, ensuring that the model can learn the correlation between input features and output labels.
[0089] Next, an initial temporal prediction model is constructed using a Long Short-Term Memory (LSTM) network. This model comprises an input layer, a temporal feature extraction layer, and an output layer. The output layer includes a classification branch for outputting proficiency levels and a regression branch for outputting attention scores. LSTM possesses powerful temporal feature extraction capabilities, capturing temporal dependencies in input features, making it suitable for processing temporal data such as touch operations. The input layer receives input features (early operation sequences). The temporal feature extraction layer, composed of LSTM layers, extracts temporal features from the input. The output layer has two branches: a classification branch outputting user proficiency levels (discrete classification task) using the Softmax activation function, and a regression branch outputting attention scores (continuous regression task) using a linear activation function.
[0090] For example, the model structure and parameter configuration of the initial time series prediction model can be referenced as follows: the input layer dimension is consistent with the feature dimension of the early operation sequence. For example, the features of each time step include touch point coordinates (2D), pressure intensity (1D), and touch start and end timestamps (2D), for a total of 5 features. The time step size is set to the number of sampling points within a preset time window (e.g., 100 sampling points), then the shape of the input layer is (100, 5). The time series feature extraction layer consists of two stacked LSTM layers. The first LSTM layer has 128 hidden units and returns the complete sequence as the input of the next layer; the second LSTM layer has 64 hidden units and only returns the output of the last time step. A Dropout layer is added after each LSTM layer with a dropout rate of 0.2 to prevent overfitting. After extracting the temporal features, the output of the second LSTM layer (64-dimensional features) is fed into two branches: the classification branch consists of a fully connected layer with the number of neurons equal to the number of proficiency levels (e.g., 5), and the activation function is Softmax, outputting the predicted probability of each level; the regression branch consists of a fully connected layer with the number of neurons 1, and the activation function is a linear activation function, outputting the predicted value of the focus score.
[0091] Finally, using the input features as model input and the first and second output labels as supervision labels, the initial time-series prediction model is subjected to supervised training until both the classification loss function of the classification branch and the regression loss function of the regression branch converge, resulting in a trained time-series prediction model. Supervised training involves adjusting the model's network parameters using the input features and corresponding supervision labels to make the model's predicted output as close as possible to the supervision labels. The preferred classification loss function is the cross-entropy loss function, used to measure the prediction error of the classification branch; the preferred regression loss function is the mean squared error loss function, used to measure the prediction error of the regression branch. The convergence criterion is: the change in the values of the two loss functions for a preset number of iterations (optionally 20) is less than a preset threshold (optionally 0.0001), indicating that the model's prediction accuracy has reached its optimal level and training is complete.
[0092] For example, the training process of a time series prediction model can be referred to as follows: 1. Data preparation: The input features and the corresponding first output label and second output label are randomly divided into training set, validation set and test set in a ratio of 7:1.5:1.5. The training set is used for model parameter update, the validation set is used for hyperparameter adjustment and convergence judgment, and the test set is used for final evaluation of model performance.
[0093] 2. Model Training: The input features from the training set are used as the model input, and the corresponding first and second output labels are used as supervision labels. A weighted loss function L = Lcls + Lreg is adopted, where the cross-entropy loss Lcls for the classification branch and the mean squared error loss Lreg for the regression branch are each weighted at 0.5. The optimizer Adam is selected, with an initial learning rate of 0.001, a batch size of 32, and a maximum of 200 training epochs. After each training epoch, the classification accuracy and regression mean squared error are calculated on the validation set. The model is considered to have converged and training is stopped when the classification accuracy on the validation set no longer improves for 10 consecutive epochs and the regression mean squared error no longer decreases for 10 consecutive epochs.
[0094] 3. Model saving: Save the model parameters that perform best on the validation set, that is, the weights corresponding to the highest classification accuracy and the smallest regression error, as the completed time series prediction model.
[0095] In summary, compared to existing technologies, this application inputs the current time sequence segment into a pre-trained temporal prediction model, outputting predicted values for proficiency level and attention score. Thus, by learning temporal dependencies from earlier operation sequences using an LSTM model, it achieves early prediction of user proficiency level and attention score, providing independent predictive evidence for multi-source evidence fusion and compensating for the shortcomings of statistical baselines and pattern libraries in capturing temporal dependencies.
[0096] S60: The deviation comparison result, the similarity matching result, the proficiency level prediction value, and the focus score prediction value are fused using a Bayesian fusion method to obtain the current user's operation proficiency level and focus score.
[0097] Through the aforementioned steps, multidimensional evidence has been obtained: the statistical behavioral baseline provides the degree of deviation between the current operation and each proficiency level (Z score), the deep behavioral pattern library provides the degree of matching (similarity) between the current operation and each typical pattern, and the time-series prediction model provides independent proficiency level and focus score prediction values.
[0098] However, single pieces of evidence have limitations—deviation cannot be directly mapped to probability, similarity lacks a direct correlation with proficiency, and model predictions may be biased.
[0099] To address the aforementioned issues, this application fuses the deviation comparison results, similarity matching results, proficiency level predictions, and focus score predictions using a Bayesian fusion method to obtain the current user's operational proficiency level and focus score. In this way, by comprehensively utilizing the advantages of each piece of evidence, a comprehensive and reliable user status assessment result is obtained.
[0100] Specifically, step S60 in the method includes: The Z-score in the deviation comparison result is converted into the posterior probability distribution of the current user's proficiency level among each user. The similarity values in the similarity matching results are normalized and used as the matching probability distribution between the current user's operation mode and each typical operation mode prototype. Based on the user self-rated focus tags associated with each typical operation mode prototype stored in the deep behavior pattern library, the matching probability distribution is mapped to the prior distribution of focus score. The predicted proficiency level output by the time series prediction model is used as the observed likelihood of the current user's proficiency level, and the predicted focus score output by the time series prediction model is used as the observed likelihood of the current user's focus score. Using a Bayesian fusion formula, the posterior probability distribution, the prior distribution, the observed likelihood of the proficiency level, and the observed likelihood of the focus score are weighted and fused to obtain the fused posterior probability of the current user belonging to each user's proficiency level. The user proficiency level with the largest fused posterior probability is selected as the current user's operation proficiency level. Based on the fused posterior probability and the prior distribution of the attention score, the posterior expected value of the attention score is calculated, and the posterior expected value is used as the attention score of the current user.
[0101] In this embodiment, the Z-score in the deviation comparison result is first converted into a posterior probability distribution of the current user's skill level. The Z-score reflects the degree of deviation between the current user's operation and the baseline of each skill level; the smaller the absolute value of the Z-score, the greater the probability that the current user belongs to that level. The Z-score is then converted into a posterior probability using a normal distribution function. Each skill level corresponds to one posterior probability, and the sum of the posterior probabilities of all levels is 1, forming a posterior probability distribution that quantifies the likelihood of the current user belonging to each skill level.
[0102] Secondly, the similarity values in the similarity matching results are normalized and used as the matching probability distribution between the current user's operation mode and the prototypes of each typical operation mode. Then, based on the user's self-rated focus tags associated with each typical operation mode prototype stored in the deep behavioral pattern library, the matching probability distribution is mapped to a prior distribution of focus scores. Normalization refers to mapping similarity values to the 0-1 interval, making the sum of all similarity values equal to 1, thus obtaining the matching probability distribution, reflecting the matching probability between the current user's operation mode and each typical mode. The prior distribution of focus scores refers to the probability distribution of the current user's focus score calculated based on the focus tags associated with each typical mode, combined with the matching probability distribution, reflecting the preliminary distribution characteristics of the current user's focus.
[0103] Next, the predicted proficiency level output by the time-series prediction model is used as the observed likelihood of the current user's proficiency level, and the predicted attention score output by the time-series prediction model is used as the observed likelihood of the current user's attention score. Here, observed likelihood refers to the probability estimate of the current user's proficiency level and attention score based on the model's prediction results; the observed likelihood of the proficiency level is the predicted probability of each proficiency level output by the model, and the observed likelihood of the attention score is the probability distribution of the attention score output by the model, such as a normal distribution.
[0104] Furthermore, a Bayesian fusion formula is employed to weightedly fuse the posterior probability distribution, prior distribution, observed likelihood of proficiency level, and observed likelihood of focus score to obtain the fused posterior probability of the current user belonging to each user proficiency level. The proficiency level with the highest fused posterior probability is selected as the current user's operational proficiency level. The Bayesian fusion formula, based on Bayes' theorem, weights multiple probability distributions for fusion. The weights can be pre-set according to the reliability of each indicator; for example, the weights for deviation, similarity, and model prediction are 0.3, 0.3, and 0.4, respectively. The fused posterior probability refers to the final probability of the current user belonging to each proficiency level after fusing all indicators. Selecting the level with the highest fused posterior probability as the final proficiency level ensures the accuracy of the evaluation results.
[0105] The Bayesian fusion formula, based on Bayes' theorem, fuses the probability distributions of multi-source evidence. Its core idea is: fused posterior probability ∝ prior probability × likelihood function. In this embodiment, for each proficiency level i, the formula for calculating the fused posterior probability is: P(Level i | Evidence) ∝ P(Level i) · P(Deviation | Level i) · P(Matching | Level i) · P(Predicted Level | Level i), where ∝ represents proportionality, P(Level i) is the prior distribution of proficiency levels, which can be preset as a uniform distribution or a statistical distribution based on historical user distribution; P(Deviation | Level i) is the posterior probability distribution converted from the deviation comparison result (Z score), reflecting the degree of matching between the current operation and the baseline of each level; P(Matching | Level i) is the posterior probability distribution mapped from the similarity matching result, reflecting the degree of matching between the current operation mode and the typical mode of each level; P(Predicted Level | Level i) is the predicted proficiency level output by the time-series prediction model as the observation likelihood, that is, the probability that the model prediction result is each level.
[0106] For example, suppose the proficiency levels are divided into three levels: beginner, intermediate, and expert. The prior distribution is set to a uniform distribution, i.e., P(beginner) = P(intermediate) = P(expert) = 1 / 3. The likelihood of each level is calculated by comparing deviations: P(deviation|beginner) = 0.7, P(deviation|intermediate) = 0.2, P(deviation|expert) = 0.1. The likelihood of each level is calculated by similarity matching: P(match|beginner) = 0.2, P(match|intermediate) = 0.5, P(match|expert) = 0.3. The probability of each level output by the time series prediction model is: P(predicted level|beginner) = 0.1, P(predicted level|intermediate) = 0.3, P(predicted level|expert) = 0.6. The unnormalized posterior probabilities for each level are: Novice: 1 / 3 × 0.7 × 0.2 × 0.1 = 0.00467, Intermediate: 1 / 3 × 0.2 × 0.5 × 0.3 = 0.01, Expert: 1 / 3 × 0.1 × 0.3 × 0.6 = 0.006, with a total sum of 0.00467 + 0.01 + 0.006 = 0.02067. After normalization, the posterior probabilities for each level are: Novice: 0.00467 / 0.02067 ≈ 0.226, Intermediate: 0.01 / 0.02067 ≈ 0.484, Expert: 0.006 / 0.02067 ≈ 0.290. The level with the highest normalized probability, i.e., the Intermediate level, is selected as the current user's skill level.
[0107] Finally, based on the fused posterior probability and the prior distribution of the attention score, the posterior expected value of the attention score is calculated, and this posterior expected value is used as the current user's attention score. The posterior expected value refers to the expected value of the attention score calculated based on the fused posterior probability and the prior distribution of attention. It comprehensively reflects the influence of all indicators on attention, resulting in an objective and accurate attention score. For example, the formula for calculating the posterior expected value of the attention score is: In the formula, The posterior expected value of the attention score. Let be the average attention score associated with the i-th typical operating mode prototype. The posterior expectation integrates the contribution of each mode to attention, and the modes are weighted according to the fused posterior probability to obtain the final attention score.
[0108] For example, continuing from the previous example, if the average attention scores associated with each model prototype are 60 for beginner mode, 80 for intermediate mode, and 95 for expert mode, then the posterior expected value of the attention score is: s = 0.226 × 60 + 0.484 × 80 + 0.290 × 95 = 79.83, that is, the current user's attention score is 79.83.
[0109] In summary, compared to existing technologies, this application fuses the deviation comparison results, similarity matching results, proficiency level prediction values, and focus score prediction values using a Bayesian fusion method to obtain the current user's operational proficiency level and focus score. Thus, by probabilistically integrating multi-source evidence such as statistical baseline deviation, pattern library similarity, and time-series prediction model output, it fully leverages the advantages of each piece of evidence, overcomes the limitations of single evidence, and achieves a comprehensive and accurate assessment of the user's proficiency level and focus score, significantly improving the reliability and robustness of the assessment results.
[0110] In summary, the embodiments of this application have at least the following technical effects: Compared to existing technologies, this application first collects massive amounts of historical touch operation data from mobile game users, and then performs time-series slicing and event alignment processing on the historical touch operation data. This transforms the continuous raw touch stream into structured time-series samples with in-game event tags and user self-rated attention tags, constructing a behavioral sample library containing touch-event alignment sequences and tags. This provides high-quality and diverse basic data for subsequent statistical analysis, pattern clustering, and model training, ensuring the accuracy and generalization ability of behavioral analysis. Secondly, based on the behavioral sample library, the distribution parameters of the multi-finger operation synchronization index under each proficiency level are statistically analyzed to construct a statistical behavioral baseline. This quantifies the synchronization index of users with different proficiency levels into a statistical benchmark with distribution parameters, providing a quantitative comparison basis for subsequent real-time assessment of the deviation between the current user's operation and the standard baseline. Simultaneously, density peak clustering is performed on the operation patterns in the behavioral sample library to extract typical operation pattern prototypes. Based on these prototypes, a deep behavioral pattern library is constructed, summarizing massive discrete operations into interpretable pattern prototypes. These prototypes are then linked to in-game event tags and user self-assessed focus tags, providing an efficient and interpretable reference basis for subsequent real-time matching, similarity calculation, and focus assessment.
[0111] Then, real-time data of the current user's touch operations is collected to generate the current time-series segment. The Z-score of the multi-finger operation synchronization index of the current time-series segment relative to the proficiency level of each user in the statistical behavior baseline is calculated to obtain the deviation comparison result, which quantifies the degree of deviation between the current operation and each proficiency level. At the same time, the multi-dimensional temporal feature vector of the current time-series segment is extracted, and its similarity with the prototype of each typical operation pattern in the deep behavior pattern library is calculated to obtain the similarity matching result, which quantifies the degree of matching between the current operation pattern and each typical operation pattern. Meanwhile, the current time-series segment is input into a pre-trained temporal prediction model. This model is constructed using a long short-term memory network. By learning temporal dependencies from earlier operation sequences, it outputs predicted proficiency level values and attention score values, providing independent predictive evidence for multi-source evidence fusion. Finally, the Z-score in the deviation comparison results is converted into a posterior probability distribution, and the similarity values in the similarity matching results are normalized into a matching probability distribution and mapped to the prior distribution of the attention score. The predicted proficiency level and the predicted attention score output by the time series prediction model are used as observation likelihoods and weighted and fused using the Bayesian fusion formula to obtain the fused posterior probability. The user proficiency level with the highest fused posterior probability is selected as the current user's operation proficiency level, and the posterior expectation value is calculated based on the fused posterior probability and the prior distribution of the attention score as the current user's attention score, thus realizing the probabilistic integration of multi-source evidence.
[0112] Through the above technical solutions, this application constructs a complete analysis chain from historical data mining and behavioral pattern extraction to real-time multi-source evidence fusion, effectively solving the technical problems of existing methods such as coarse granularity, poor real-time performance, and difficulty in quantifying focus. It improves the accuracy and real-time performance of mobile game user behavior analysis and provides reliable technical support for personalized game recommendations, skill assessment, and anti-cheating detection.
[0113] Example 2, as Figure 2 As shown, based on the same inventive concept as the mobile game user behavior analysis method based on big data analysis provided in Embodiment 1, this embodiment of the invention also provides a mobile game user behavior analysis system based on big data analysis, including: The behavior sample construction module 11 is used to collect historical touch operation data of a large number of mobile game users, and to perform time-series slicing and event alignment processing on the historical touch operation data to construct a behavior sample library. The statistical behavior baseline construction module 12 is used to statistically analyze the distribution parameters of the multi-finger operation synchronization index under each proficiency level based on the behavior sample library, and construct a statistical behavior baseline. The deep behavior pattern library construction module 13 is used to cluster the operation patterns in the behavior sample library, extract typical operation pattern prototypes, and construct a deep behavior pattern library based on the typical operation pattern prototypes. The real-time user behavior analysis module 14 is used to collect the current user's touch operation data in real time, generate the current time segment, compare the deviation of the current time segment with the statistical behavior baseline, and match the similarity with the deep behavior pattern library to obtain the deviation comparison result and the similarity matching result. The time series prediction module 15 is used to input the current time series segment into the pre-trained time series prediction model and output the predicted value of proficiency level and focus score. The fusion analysis module 16 is used to fuse the deviation comparison result, the similarity matching result, the proficiency level prediction value, and the focus score prediction value using a Bayesian fusion method to obtain the current user's operation proficiency level and focus score.
[0114] The behavior sample construction module 11 is specifically used for: Collect massive amounts of raw touch event streams generated by mobile game users during historical gameplay. The raw touch event streams include at least the touch point coordinates, pressure value, and touch start and end timestamps for each touch event. Collect in-game event tags and user-rated focus tags that are synchronized in time with the original touch event stream, wherein the in-game event tags include at least victory event tags, defeat event tags, and level completion event tags; The original touch event stream is time-sliced according to a preset fixed duration window to obtain multiple time segments; For each time segment, perform a time-series correlation analysis between the touch event sequence within that time segment and the in-game event tag sequence within the same time window to obtain the touch-event alignment sequence. The touch-event alignment sequence, along with the in-game event tags and user-rated focus tags associated with that timing segment, are used as a behavioral sample. The collection of all behavioral samples is then used to construct a behavioral sample library.
[0115] Specifically, for each time segment, a temporal correlation analysis is performed between the touch event sequence within that time segment and the in-game event tag sequence within the same time window to obtain a touch-event alignment sequence, including: Extract feature change points from the touch event sequence, wherein the feature change points include at least one of the following: abrupt change points of pressure intensity, extreme acceleration points of touch point coordinates, and points of increase or decrease in the number of touch points; The moment when an in-game event tag appears from nothing or changes from one state to another is taken as the event occurrence time. Calculate the cross-correlation function between the feature change point and the time of the event occurrence, find the offset that maximizes the cross-correlation function value by sliding the time offset, and use the time offset as the response delay between the touch event and the game event; Using the response delay as an alignment offset, the touch event sequence is shifted backward or forward by the response delay to align the touch events with the corresponding game events on the timeline, thus obtaining a touch-event alignment sequence.
[0116] Specifically, the statistical behavior baseline construction module 12 is used for: Extract the user proficiency level corresponding to each behavior sample from the behavior sample library, wherein the user proficiency level is pre-divided into multiple discrete levels based on the user's game rank data, game win rate data, or operation score data; For each user's proficiency level, all behavior samples belonging to that user's proficiency level are selected from the behavior sample library to form a subset of behavior samples corresponding to that user's proficiency level. For each behavior sample in each subset of behavior samples, calculate the multi-finger operation synchronization index of that behavior sample. The multi-finger operation synchronization index includes at least one of the following: the average number of touch points at the same time, the variance of the number of touch points at the same time, the standard deviation of the time difference between multiple fingers pressing down, the average time difference between multiple fingers lifting up, and the stability parameter of the relative distance between touch points. Statistical analysis was performed on the multi-finger operation synchronization index of all behavioral samples under each user proficiency level. The arithmetic mean and standard deviation of each index were calculated, and the arithmetic mean and standard deviation were used as the distribution parameters of the corresponding index under the user proficiency level. The distribution parameters of all user proficiency levels are combined to form a statistical behavioral baseline.
[0117] The deep behavior pattern library construction module 13 is specifically used for: Extract the touch point coordinate sequence, pressure value sequence, and touch start and end timestamp sequence from each behavior sample. Normalize the touch point coordinate sequence, pressure value sequence, and touch start and end timestamp sequence respectively. Then, concatenate the normalized touch point coordinate sequence, normalized pressure value sequence, and normalized touch start and end timestamp sequence along the time axis to form a multi-dimensional temporal feature vector. The multidimensional temporal feature vector is divided into multiple clusters; For each cluster, the arithmetic mean of all multidimensional temporal feature vectors within the cluster is calculated, and the arithmetic mean is used as the prototype of the typical operation mode of the cluster. Each typical operation pattern prototype is associated with and stored in-game event tags and user self-rated focus tags associated with the behavior samples within the cluster to which the typical operation pattern prototype belongs, thus constructing a deep behavior pattern library.
[0118] Specifically, the multidimensional temporal feature vector is divided into multiple clusters, including: Calculate the Euclidean distance between each multidimensional temporal feature vector and all other multidimensional temporal feature vectors, and construct a distance matrix; Based on the distance matrix, the local density of each multidimensional temporal feature vector is calculated, wherein the local density is equal to the number of other multidimensional temporal feature vectors whose distance from the multidimensional temporal feature vector is less than a preset truncation distance; Based on the distance matrix, the minimum value of the Euclidean distance between the multidimensional temporal feature vector and all other multidimensional temporal feature vectors with higher local density than the multidimensional temporal feature vector is selected as the minimum distance of each multidimensional temporal feature vector. A decision graph is drawn using the local density of each multidimensional temporal feature vector as the first coordinate and the minimum distance of each multidimensional temporal feature vector as the second coordinate. The multidimensional temporal feature vectors in the decision graph that simultaneously satisfy the conditions of local density being greater than a preset density threshold and minimum distance being greater than a preset distance threshold are identified as cluster center points. For each non-cluster centroid's multidimensional temporal feature vector, the multidimensional temporal feature vector of the non-cluster centroid is assigned to the cluster to which the nearest cluster centroid with higher local density belongs, resulting in multiple clusters.
[0119] The real-time user behavior analysis module 14 is specifically used for: Real-time collection of touch operation data generated by the current user during the current game, and time-series slicing and event alignment processing of the touch operation data to generate the current time sequence segment; Calculate the multi-finger operation synchronization index of the current time segment, wherein the multi-finger operation synchronization index includes at least one of the following: the average number of touch points at the same time, the variance of the number of touch points at the same time, the standard deviation of the time difference between multiple fingers pressing down, the average time difference between multiple fingers lifting up, and the stability parameter of the relative distance between touch points. Calculate the Z-score of the multi-finger operation synchronization index of the current time segment relative to the proficiency level of each user in the statistical behavior baseline, and use the calculated Z-score as the deviation comparison result; Extract the multidimensional temporal feature vector of the current time segment; The similarity between the multidimensional temporal feature vector and each typical operation pattern prototype in the deep behavior pattern library is calculated, and the calculated similarity value is used as the similarity matching result.
[0120] Specifically, the time-series prediction module 15 is used for: The early operation sequence within a preset time window of each behavior sample is extracted from the behavior sample library as input features. The early operation sequence includes a subsequence of touch point coordinates, a subsequence of pressure value, and a subsequence of touch start and end timestamps within the preset time window. Extract the user proficiency level corresponding to each behavior sample from the behavior sample library as the first output label, and extract the user self-assessment focus score corresponding to each behavior sample as the second output label. An initial temporal prediction model is constructed using a long short-term memory network. The initial temporal prediction model includes an input layer, a temporal feature extraction layer, and an output layer. The output layer includes a classification branch for outputting proficiency level and a regression branch for outputting focus score. Using the input features as model input and the first output label and the second output label as supervision labels, the initial time series prediction model is trained in a supervised manner until the classification loss function of the classification branch and the regression loss function of the regression branch converge, thus obtaining the trained time series prediction model.
[0121] The fusion analysis module 16 is specifically used for: The Z-score in the deviation comparison result is converted into the posterior probability distribution of the current user's proficiency level among each user. The similarity values in the similarity matching results are normalized and used as the matching probability distribution between the current user's operation mode and each typical operation mode prototype. Based on the user self-rated focus tags associated with each typical operation mode prototype stored in the deep behavior pattern library, the matching probability distribution is mapped to the prior distribution of focus score. The predicted proficiency level output by the time series prediction model is used as the observed likelihood of the current user's proficiency level, and the predicted focus score output by the time series prediction model is used as the observed likelihood of the current user's focus score. Using a Bayesian fusion formula, the posterior probability distribution, the prior distribution, the observed likelihood of the proficiency level, and the observed likelihood of the focus score are weighted and fused to obtain the fused posterior probability of the current user belonging to each user's proficiency level. The user proficiency level with the largest fused posterior probability is selected as the current user's operation proficiency level. Based on the fused posterior probability and the prior distribution of the attention score, the posterior expected value of the attention score is calculated, and the posterior expected value is used as the attention score of the current user.
[0122] It should be noted that the descriptions of each embodiment in the above embodiments have different focuses. For parts that are not described in detail in a certain embodiment, please refer to the relevant descriptions in other embodiments.
[0123] Those skilled in the art will understand that embodiments of the present invention can be provided as methods, systems, or computer program products. Therefore, the present invention can take the form of a completely hardware embodiment, a completely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present invention can take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) containing computer-usable program code.
[0124] This invention is described with reference to flowchart illustrations and / or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each block of the flowchart illustrations and / or block diagrams, and combinations of blocks in the flowchart illustrations and / or block diagrams, can be implemented by computer program instructions. These computer program instructions can be provided to a processor of a general-purpose computer, special-purpose computer, embedded computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, generate instructions for implementing the flowchart illustrations. Figure 1 One or more processes and / or boxes Figure 1A device that provides the functions specified in one or more boxes.
[0125] These computer program instructions may also be stored in a computer-readable storage medium that can direct a computer or other programmable data processing device to function in a particular manner, such that the instructions stored in the computer-readable storage medium produce an article of manufacture including instruction means, which are implemented in a process Figure 1 One or more processes and / or boxes Figure 1 The function specified in one or more boxes.
[0126] These computer program instructions may also be loaded onto a computer or other programmable data processing equipment to cause a series of operational steps to be performed on the computer or other programmable equipment to produce a computer-implemented process, thereby providing instructions that execute on the computer or other programmable equipment for implementing the process. Figure 1 One or more processes and / or boxes Figure 1 The steps of the function specified in one or more boxes.
[0127] Although preferred embodiments of the invention have been described, those skilled in the art, once they have learned the basic inventive concept, can make other changes and modifications to these embodiments.
[0128] Obviously, those skilled in the art can make various modifications and variations to this invention without departing from its spirit and scope. Therefore, if these modifications and variations fall within the scope of this invention and its equivalents, this invention also intends to include these modifications and variations.
Claims
1. A method for analyzing mobile game user behavior based on big data analytics, characterized in that, The method includes: Collect massive amounts of historical touch operation data from mobile game users, and perform time-series slicing and event alignment processing on the historical touch operation data to construct a behavior sample library; Based on the aforementioned behavioral sample library, the distribution parameters of the multi-finger operation synchronization index under each proficiency level were statistically analyzed, and a statistical behavioral baseline was constructed. The operation patterns in the behavior sample library are clustered to extract typical operation pattern prototypes, and a deep behavior pattern library is constructed based on the typical operation pattern prototypes. Real-time collection of the current user's touch operation data generates the current time segment. The current time segment is compared with the statistical behavior baseline and matched with the deep behavior pattern library to obtain the deviation comparison result and the similarity matching result. The current time segment is input into a pre-trained time prediction model, and the predicted values of proficiency level and focus score are output. The deviation comparison result, the similarity matching result, the proficiency level prediction value, and the focus score prediction value are fused using a Bayesian fusion method to obtain the current user's operation proficiency level and focus score.
2. The mobile game user behavior analysis method based on big data analysis according to claim 1, characterized in that, Collect massive amounts of historical touch operation data from mobile game users, and perform time-series slicing and event alignment processing on the historical touch operation data to construct a behavior sample library, including: Collect massive amounts of raw touch event streams generated by mobile game users during historical gameplay. The raw touch event streams include at least the touch point coordinates, pressure value, and touch start and end timestamps for each touch event. Collect in-game event tags and user-rated focus tags that are synchronized in time with the original touch event stream, wherein the in-game event tags include at least victory event tags, defeat event tags, and level completion event tags; The original touch event stream is time-sliced according to a preset fixed duration window to obtain multiple time segments; For each time segment, perform a time-series correlation analysis between the touch event sequence within that time segment and the in-game event tag sequence within the same time window to obtain the touch-event alignment sequence. The touch-event alignment sequence, along with the in-game event tags and user-rated focus tags associated with that timing segment, are used as a behavioral sample. The collection of all behavioral samples is then used to construct a behavioral sample library.
3. The mobile game user behavior analysis method based on big data analysis according to claim 2, characterized in that, For each time segment, a temporal correlation analysis is performed between the touch event sequence within that time segment and the in-game event tag sequence within the same time window to obtain a touch-event alignment sequence, including: Extract feature change points from the touch event sequence, wherein the feature change points include at least one of the following: abrupt change points of pressure intensity, extreme acceleration points of touch point coordinates, and points of increase or decrease in the number of touch points; The moment when an in-game event tag appears from nothing or changes from one state to another is taken as the event occurrence time. Calculate the cross-correlation function between the feature change point and the time of the event occurrence, find the offset that maximizes the cross-correlation function value by sliding the time offset, and use the time offset as the response delay between the touch event and the game event; Using the response delay as an alignment offset, the touch event sequence is shifted backward or forward by the response delay to align the touch events with the corresponding game events on the timeline, thus obtaining a touch-event alignment sequence.
4. The mobile game user behavior analysis method based on big data analysis according to claim 1, characterized in that, Based on the aforementioned behavioral sample library, the distribution parameters of the multi-finger operation synchronization index were statistically analyzed for each proficiency level, and a statistical behavioral baseline was constructed, including: Extract the user proficiency level corresponding to each behavior sample from the behavior sample library, wherein the user proficiency level is pre-divided into multiple discrete levels based on the user's game rank data, game win rate data, or operation score data; For each user's proficiency level, all behavior samples belonging to that user's proficiency level are selected from the behavior sample library to form a subset of behavior samples corresponding to that user's proficiency level. For each behavior sample in each subset of behavior samples, calculate the multi-finger operation synchronization index of that behavior sample. The multi-finger operation synchronization index includes at least one of the following: the average number of touch points at the same time, the variance of the number of touch points at the same time, the standard deviation of the time difference between multiple fingers pressing down, the average time difference between multiple fingers lifting up, and the stability parameter of the relative distance between touch points. Statistical analysis was performed on the multi-finger operation synchronization index of all behavioral samples under each user proficiency level. The arithmetic mean and standard deviation of each index were calculated, and the arithmetic mean and standard deviation were used as the distribution parameters of the corresponding index under the user proficiency level. The distribution parameters of all user proficiency levels are combined to form a statistical behavioral baseline.
5. The mobile game user behavior analysis method based on big data analysis according to claim 1, characterized in that, The operation patterns in the behavior sample library are clustered to extract typical operation pattern prototypes, and a deep behavior pattern library is constructed based on the typical operation pattern prototypes, including: Extract the touch point coordinate sequence, pressure value sequence, and touch start and end timestamp sequence from each behavior sample. Normalize the touch point coordinate sequence, pressure value sequence, and touch start and end timestamp sequence respectively. Then, concatenate the normalized touch point coordinate sequence, normalized pressure value sequence, and normalized touch start and end timestamp sequence along the time axis to form a multi-dimensional temporal feature vector. The multidimensional temporal feature vector is divided into multiple clusters; For each cluster, the arithmetic mean of all multidimensional temporal feature vectors within the cluster is calculated, and the arithmetic mean is used as the prototype of the typical operation mode of the cluster. Each typical operation pattern prototype is associated with and stored in-game event tags and user self-rated focus tags associated with the behavior samples within the cluster to which the typical operation pattern prototype belongs, thus constructing a deep behavior pattern library.
6. The mobile game user behavior analysis method based on big data analysis according to claim 5, characterized in that, The multidimensional temporal feature vector is divided into multiple clusters, including: Calculate the Euclidean distance between each multidimensional temporal feature vector and all other multidimensional temporal feature vectors, and construct a distance matrix; Based on the distance matrix, the local density of each multidimensional temporal feature vector is calculated, wherein the local density is equal to the number of other multidimensional temporal feature vectors whose distance from the multidimensional temporal feature vector is less than a preset truncation distance; Based on the distance matrix, the minimum value of the Euclidean distance between the multidimensional temporal feature vector and all other multidimensional temporal feature vectors with higher local density than the multidimensional temporal feature vector is selected as the minimum distance of each multidimensional temporal feature vector. A decision graph is drawn using the local density of each multidimensional temporal feature vector as the first coordinate and the minimum distance of each multidimensional temporal feature vector as the second coordinate. The multidimensional temporal feature vectors in the decision graph that simultaneously satisfy the conditions of local density being greater than a preset density threshold and minimum distance being greater than a preset distance threshold are identified as cluster center points. For each non-cluster centroid's multidimensional temporal feature vector, the multidimensional temporal feature vector of the non-cluster centroid is assigned to the cluster to which the nearest cluster centroid with higher local density belongs, resulting in multiple clusters.
7. The mobile game user behavior analysis method based on big data analysis according to claim 1, characterized in that, Real-time acquisition of current user touch operation data generates a current time segment. This current time segment is then compared with the statistical behavior baseline for deviation and matched with the deep behavior pattern library for similarity, yielding deviation comparison results and similarity matching results, including: Real-time collection of touch operation data generated by the current user during the current game, and time-series slicing and event alignment processing of the touch operation data to generate the current time sequence segment; Calculate the multi-finger operation synchronization index of the current time segment, wherein the multi-finger operation synchronization index includes at least one of the following: the average number of touch points at the same time, the variance of the number of touch points at the same time, the standard deviation of the time difference between multiple fingers pressing down, the average time difference between multiple fingers lifting up, and the stability parameter of the relative distance between touch points. Calculate the Z-score of the multi-finger operation synchronization index of the current time segment relative to the proficiency level of each user in the statistical behavior baseline, and use the calculated Z-score as the deviation comparison result; Extract the multidimensional temporal feature vector of the current time segment; The similarity between the multidimensional temporal feature vector and each typical operation pattern prototype in the deep behavior pattern library is calculated, and the calculated similarity value is used as the similarity matching result.
8. The mobile game user behavior analysis method based on big data analysis according to claim 1, characterized in that, The process of building a time series prediction model includes: The early operation sequence within a preset time window of each behavior sample is extracted from the behavior sample library as input features. The early operation sequence includes a subsequence of touch point coordinates, a subsequence of pressure value, and a subsequence of touch start and end timestamps within the preset time window. Extract the user proficiency level corresponding to each behavior sample from the behavior sample library as the first output label, and extract the user self-assessment focus score corresponding to each behavior sample as the second output label. An initial temporal prediction model is constructed using a long short-term memory network. The initial temporal prediction model includes an input layer, a temporal feature extraction layer, and an output layer. The output layer includes a classification branch for outputting proficiency level and a regression branch for outputting focus score. Using the input features as model input and the first output label and the second output label as supervision labels, the initial time series prediction model is trained in a supervised manner until the classification loss function of the classification branch and the regression loss function of the regression branch converge, thus obtaining the trained time series prediction model.
9. The mobile game user behavior analysis method based on big data analysis according to claim 1, characterized in that, The deviation comparison result, the similarity matching result, the proficiency level prediction value, and the focus score prediction value are fused using a Bayesian fusion method to obtain the current user's operational proficiency level and focus score, including: The Z-score in the deviation comparison result is converted into the posterior probability distribution of the current user's proficiency level among each user. The similarity values in the similarity matching results are normalized and used as the matching probability distribution between the current user's operation mode and each typical operation mode prototype. Based on the user self-rated focus tags associated with each typical operation mode prototype stored in the deep behavior pattern library, the matching probability distribution is mapped to the prior distribution of focus score. The predicted proficiency level output by the time series prediction model is used as the observed likelihood of the current user's proficiency level, and the predicted focus score output by the time series prediction model is used as the observed likelihood of the current user's focus score. Using a Bayesian fusion formula, the posterior probability distribution, the prior distribution, the observed likelihood of the proficiency level, and the observed likelihood of the focus score are weighted and fused to obtain the fused posterior probability of the current user belonging to each user's proficiency level. The user proficiency level with the largest fused posterior probability is selected as the current user's operation proficiency level. Based on the fused posterior probability and the prior distribution of the attention score, the posterior expected value of the attention score is calculated, and the posterior expected value is used as the attention score of the current user.
10. A mobile game user behavior analysis system based on big data analytics, characterized in that: The method for performing mobile game user behavior analysis based on big data analysis as described in any one of claims 1-9 includes: The behavior sample construction module is used to collect historical touch operation data of massive mobile game users, and to perform time-series slicing and event alignment processing on the historical touch operation data to construct a behavior sample library; The statistical behavior baseline construction module is used to statistically analyze the distribution parameters of the multi-finger operation synchronization index under each proficiency level based on the behavior sample library, and construct the statistical behavior baseline. The deep behavior pattern library construction module is used to cluster the operation patterns in the behavior sample library, extract typical operation pattern prototypes, and construct a deep behavior pattern library based on the typical operation pattern prototypes. The real-time user behavior analysis module is used to collect the current user's touch operation data in real time, generate the current time segment, compare the deviation of the current time segment with the statistical behavior baseline, and perform similarity matching with the deep behavior pattern library to obtain the deviation comparison result and the similarity matching result. The time-series prediction module is used to input the current time-series segment into a pre-trained time-series prediction model and output the predicted values of proficiency level and focus score. The fusion analysis module is used to fuse the deviation comparison result, the similarity matching result, the proficiency level prediction value, and the focus score prediction value using a Bayesian fusion method to obtain the current user's operation proficiency level and focus score.