A human resource intelligent analysis method based on multi-source data machine learning
By constructing a machine learning model based on multi-source data, dynamic knowledge influence profiles and future skills growth trajectories are generated, solving the problems of data isolation and staticity in traditional human resource analysis methods. This enables in-depth insights and forward-looking predictions of employees within the organization, improving the scientific nature and accuracy of talent management.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- SHENZHEN QIANHAI ZHONGKE DIGITAL TECH CO LTD
- Filing Date
- 2026-05-06
- Publication Date
- 2026-06-19
AI Technical Summary
Traditional human resource analysis methods rely on static, isolated data, which cannot effectively capture the dynamic development and actual contributions of employees within an organization, identify high-potential talent, assess team health, or provide early warnings of turnover risks. They also lack in-depth insights and forward-looking predictions.
By constructing a machine learning model based on multi-source data, including acquiring static profile data and interactive metadata of employees, constructing a multimodal organizational network graph, applying graph embedding models to generate network fusion latent vectors, conducting iterative analysis, and combining time series analysis and prediction models, dynamic knowledge influence profiles and future skills growth trajectories are generated.
It reveals the collaborative relationships and influence of employees outside the formal structure, provides a more comprehensive and objective assessment perspective, supports timely talent management intervention and forward-looking guidance, reduces the risk of skills mismatch, and improves the accuracy of talent decisions.
Smart Images

Figure CN122243433A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the fields of human resource management and artificial intelligence technology, specifically to a human resource intelligent analysis method based on machine learning from multi-source data. Background Technology
[0002] In contemporary corporate management, human capital is widely recognized as an organization's most core asset. Therefore, the scientific and efficient identification, assessment, development, and retention of talent has become crucial to determining an organization's long-term competitiveness. To achieve this goal, Human Resource Analytics has emerged, aiming to provide objective evidence for talent management decisions through data-driven approaches.
[0003] Traditional human resource analysis methods, while driving the digital transformation of management to some extent, have significant limitations in their technical means and analytical dimensions, resulting in insufficient capabilities in in-depth insights and forward-looking predictions. These limitations are mainly reflected in the following aspects:
[0004] First, traditional methods rely heavily on structured, static employee profile data, such as resumes, educational backgrounds, salary grades, and periodic performance evaluations. While this data is fundamental, it is essentially a static snapshot of the past, failing to reflect an employee's dynamic development and actual contributions within the organization. Data sources are isolated from each other, forming "data silos"—for example, performance data is disconnected from daily collaboration data, and skills certification data is separated from actual project contribution data. This data-level fragmentation makes it difficult for analytical models to build a comprehensive and multi-dimensional understanding of employee capabilities and potential, often resulting in conclusions that are one-sided and lack depth.
[0005] Second, traditional analytical methods are typically based on formal organizational charts, such as analyzing departmental staff turnover or team performance. However, the actual operation of an organization is far more complex than its formal structure, relying on an "implicit network" comprised of informal collaboration, information flow, and knowledge sharing. Traditional techniques cannot effectively capture and quantify this network. Who are the key hubs for cross-departmental collaboration? Who are the tacit knowledge authorities in specific domains? Which employees or teams are on the periphery of information transmission? Traditional methods cannot answer these questions, which are crucial for identifying high-potential talent, assessing team health, and warning of turnover risks. They only see the "trees" (individuals) while ignoring the "forest" (the true position and role of individuals within the organizational network). Summary of the Invention
[0006] To address the shortcomings of existing technologies, this invention provides a human resource intelligent analysis method based on machine learning from multi-source data, in order to solve the problems mentioned in the background section.
[0007] To achieve the above objectives, the present invention provides the following technical solution: a human resource intelligent analysis method based on machine learning from multi-source data, comprising the following steps:
[0008] S1. Obtain the static file data and individual historical data of the i-th employee in the organization, construct a machine learning model to encode the data, and generate an individual basic potential vector;
[0009] S2. Obtain the interaction metadata of the internal collaborative network of the organization, and construct a multimodal organizational network graph with the i-th employee as the node and the collaborative relationship as the edge based on the organizational structure data and interaction metadata; apply the graph embedding model to the multimodal organizational network graph to learn and generate a network fusion latent vector that integrates its network topology and neighborhood collaboration information for the i-th employee node, and iterate based on the network fusion latent vector to construct a category probability vector Pᵢ, so that the i-th employee is assigned to the corresponding group;
[0010] S3. For the corresponding group, collect the interaction metadata sequence of the group in the knowledge base and collaboration platform as it changes over time; input the interaction metadata sequence into the time series analysis model, and construct a dynamic knowledge influence profile by learning its interaction patterns and evolution trends in the information flow network.
[0011] S4. For the corresponding groups, collect historical data sequences of their skill acquisition and skill demand data for future project planning; input the historical data sequences of skill acquisition into the time series prediction model to generate future skill growth trajectories, and align and analyze the future skill growth trajectories with the skill demand data to generate personalized skill growth prediction maps to guide talent development.
[0012] Preferably, static profile data includes at least one or more of the following: education, length of service, job level, and professional certifications; individual historical data includes at least one or more of the following: performance rating sequence, training course completion records, project participation history, and promotion records.
[0013] Construct a machine learning model, which is an autoencoder network model;
[0014] Before the machine learning model encodes static archival data and individual historical data, a preprocessing step is also included: encoding the categorical data in the static archival data, including education level and job grade; and normalizing or standardizing the numerical data in the individual historical data, including tenure and performance rating, to convert all data features into numerical inputs with uniform dimensions, forming an individual basic potential vector.
[0015] Preferably, the aforementioned interactive metadata includes at least one or more of the following: email metadata, instant messaging metadata, code repository collaboration records, and shared document editing history;
[0016] Organizational structure data defines the hierarchical structure of formal reporting relationships among employees within an organization;
[0017] The specific steps involved in constructing a multimodal organizational network diagram include:
[0018] Map each employee within the organization to a unique node i in the graph;
[0019] Based on organizational structure data, a first type of directed edge is established between employee nodes with direct reporting relationships. The direction of the first type of directed edge is determined by the reporting relationship.
[0020] Based on interaction metadata, the frequency or duration of interactions between any two employee nodes i and j that have engaged in collaborative interactions is counted to obtain interaction statistics. When the interaction statistics exceed a preset interaction threshold, a second type of directed edge is established between nodes i and j, and a weight value W is assigned to the second type of directed edge. ij W ij This is a normalized representation of the statistical value.
[0021] Preferably, the steps for learning and generating the network fusion latent vector for the i-th employee node using the graph embedding model are as follows:
[0022] The graph embedding model is determined to be a graph neural network (GNN) model, and the individual basic potential vector generated for the i-th employee in S1 is used as the initial node feature of the corresponding node i in the GNN model, denoted as . ;
[0023] The node representation is updated by performing K iterations on each node using a graph neural network (GNN) model. The calculation of the k-th iteration (1≤k≤K) follows the formula:
[0024] ;
[0025] in, Let N(i) be the node representation of node i after the k-th iteration, and let N(i) be the set of neighboring nodes of node i. Let be the aggregation function for the k-th round, used to aggregate the feature information of neighboring nodes. This is the update function for the k-th round, used to combine its own features from the previous round with the aggregated neighbor features; it represents the vector output by the i-th employee node in the last layer after K iterations. , as a potential vector for network fusion.
[0026] Preferably, the steps for classifying and identifying corresponding groups based on network fusion latent vectors are as follows:
[0027] Represent node i after the k-th iteration. The input is fed into a pre-trained downstream classifier model, which calculates the probability of an employee at node i belonging to each predefined category using the following formula. :
[0028] ;
[0029] in, and The weight matrix and bias vector of the downstream classifier model are used; the classification score is input into the Softmax function, which converts it into a classification probability value. ;
[0030] In the category probability vector Pᵢ, the probability value corresponding to the high-potential or high-risk category exceeds the preset probability threshold P. th When the i-th employee is assigned to the corresponding group, the i-th employee will be assigned to the corresponding group.
[0031] Preferably, the specific steps for collecting and quantifying the interaction metadata sequence in S3 include: setting a uniform time window length Δt, dividing the interaction behavior of the i-th employee in the corresponding group on the knowledge base and collaboration platform into T time windows in chronological order to form an interaction behavior sequence; within each time window t (1≤t≤T), extracting and quantifying the interaction behavior of the i-th employee, and constructing a multi-dimensional interaction feature vector. The expression is as follows:
[0032] ;
[0033] in, The number of knowledge documents created by the current employee within the time window t; The number of documents edited by the current employee; This represents the number of documents currently viewed by the employee. The number of comments and replies posted by the current employee; The number of times an employee mentions others during collaboration; the interaction feature vectors of T time windows are arranged chronologically to form an interaction metadata sequence. , recorded as .
[0034] Preferably, the time series analysis model is a long short-term memory network, and the specific steps for inputting the interaction metadata sequence into the time series analysis model are as follows:
[0035] The time series analysis model processes the feature vectors sequentially from 1 to T along time step t. And update its internal hidden state at each time step t. and cumulative knowledge state vector Its update process follows the formula below:
[0036] ;
[0037] in, and These are the hidden state and accumulated knowledge state vector of the previous time step, respectively; the hidden state hᵢ(T) output by the time series analysis model at the last time step T after processing the entire sequence is used as a dynamic feature representation that incorporates time series information.
[0038] Preferably, the steps for constructing a dynamic knowledge influence profile are as follows: based on the dynamic feature representation hᵢ(T), where hᵢ(T) is the hidden state. The final representation, where t=T, is the last time window;
[0039] Influence indicators are calculated through a pre-defined linear transformation layer to construct a dynamic knowledge influence profile Pᵢ.
[0040] Influence metrics include at least Knowledge Creation (KCSᵢ), Knowledge Dissemination (KDSᵢ), and Interaction Stability (ISSᵢ), which are calculated as follows:
[0041]
[0042]
[0043]
[0044] in, It is the weight of knowledge creation; It is the weight of knowledge dissemination. For the knowledge creation bias term, This is a bias term for knowledge dissemination. This is the variance operator, i.e., a function that calculates variance; The total number of interactions by an employee within the time window t, specifically ;
[0045] The final dynamic knowledge influence profile is denoted as: Pᵢ=[KCSᵢ,KDSᵢ,ISSᵢ].
[0046] Preferably, the steps for collecting and quantifying historical data on employee skill acquisition are as follows: Define an organizational skill space containing M key skills; at discrete time points k, evaluate the proficiency of the i-th employee in the M key skills, forming the current employee's skill state vector Sᵢ(k) at time point k, represented as:
[0047] Sᵢ(k)=[sᵢ1(k),sᵢ2(k),...,sᵢ m (k)];
[0048] Where sᵢf(k) is the quantitative proficiency score of employee i at time point k on the f-th skill, where 1≤f≤M; the skill state vectors at K time points are arranged in chronological order to form a multidimensional skill chronological vector sequence {Sᵢ(k)}. k=1 ᴷ;
[0049] Based on the project set of the organization's future planning, all key skills required to complete these projects are identified and mapped to an organizational skill space of M key skills. For each key skill f, a target demand level df is set according to its importance, urgency, and frequency of demand in future projects, thereby constructing a target skill demand vector D, represented as: D=[d1,d2,...,d...]. m ]; where vector D is the target benchmark for alignment analysis;
[0050] The specific steps for generating a future skill development trajectory are as follows:
[0051] A multidimensional skill time-series vector sequence is input into a time-series prediction model. The time-series prediction model analyzes the historical proficiency data of each skill in the multidimensional skill time-series vector sequence to learn its inherent trends, periodicity, and autocorrelation. Based on the learned patterns, the time-series prediction model iteratively predicts the proficiency of each skill at multiple preset time steps in the future, and combines the predicted values of each skill into a future skill state vector sequence to form a future skill growth trajectory.
[0052] Preferably, at each future time step, the predicted skill state vector and the target skill requirement vector are compared item by item to calculate the difference between the predicted proficiency of each key skill and the target requirement level; based on the preset strategic importance weight of each skill, the difference is weighted and aggregated to generate a skill gap score that can quantitatively represent the gap between the employee and the organization's overall future needs at each future time step.
[0053] The generated personalized skills growth prediction map is a structured dataset that integrates the following information: the employee's unique identifier, their complete skills acquisition history data sequence, the organization-level target skills demand vector, the dynamic knowledge influence profile Pᵢ, the future skills growth trajectory generated by the time-series prediction model, and the skills gap score sequence that changes over time and corresponds to the future skills growth trajectory.
[0054] This invention provides a human resource intelligent analysis method based on machine learning from multi-source data. It has the following beneficial effects:
[0055] This method overcomes the limitations of traditional analysis that relies on isolated, static data by integrating multi-source data, including static profiles, online interactions, and dynamic behaviors, and constructing a multimodal organizational network diagram. It reveals the actual collaborative relationships and influence of employees outside the formal structure, providing a more comprehensive and objective evaluation perspective. Secondly, through temporal analysis of employee knowledge interaction behavior, this method achieves continuous insights into employee contributions and status, changing the traditional model that relies on low-frequency, lagging performance evaluations and providing data support for timely talent management interventions. Finally, by predicting employee skill growth trajectories and quantifying their alignment with future organizational needs, it provides forward-looking guidance for talent development. This enables talent development planning to more closely serve organizational strategic goals, helps reduce the risks associated with skills mismatch, and improves the accuracy of talent decisions. Attached Figure Description
[0056] Figure 1 This is a schematic diagram of the steps of the present invention;
[0057] Figure 2 This is a schematic diagram of the S1-S2 steps of the present invention.
[0058] Figure 3 This is a schematic diagram of steps S3-S4 of the present invention. Detailed Implementation
[0059] The technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.
[0060] Example 1
[0061] Please see Figures 1 to 3 This invention provides a human resource intelligent analysis method based on machine learning from multi-source data, comprising the following steps:
[0062] S1. Obtain the static file data and individual historical data of the i-th employee in the organization, construct a machine learning model to encode the data, and generate an individual basic potential vector;
[0063] S2. Obtain the interaction metadata of the internal collaborative network of the organization, and construct a multimodal organizational network graph with the i-th employee as the node and the collaborative relationship as the edge based on the organizational structure data and interaction metadata; apply the graph embedding model to the multimodal organizational network graph to learn and generate a network fusion latent vector that integrates its network topology and neighborhood collaboration information for the i-th employee node, and iterate based on the network fusion latent vector to construct a category probability vector Pᵢ, so that the i-th employee is assigned to the corresponding group;
[0064] S3. For the corresponding group, collect the interaction metadata sequence of the group in the knowledge base and collaboration platform as it changes over time; input the interaction metadata sequence into the time series analysis model, and construct a dynamic knowledge influence profile by learning its interaction patterns and evolution trends in the information flow network.
[0065] S4. For the corresponding groups, collect historical data sequences of their skill acquisition and skill demand data for future project planning; input the historical data sequences of skill acquisition into the time series prediction model to generate future skill growth trajectories, and align and analyze the future skill growth trajectories with the skill demand data to generate personalized skill growth prediction maps to guide talent development.
[0066] In this embodiment, the method overcomes the limitations of traditional analysis that relies on isolated, static data by integrating multi-source data such as static archives, online interactions, and dynamic behaviors, and constructing a multimodal organizational network diagram. It reveals the actual collaborative relationships and influence of employees outside the formal structure, thus providing a more comprehensive and objective evaluation perspective. Secondly, through temporal analysis of employee knowledge interaction behavior, this method achieves continuous insight into employee contributions and status, changing the traditional model that relies on low-frequency, lagging performance evaluations, and providing data support for timely talent management intervention. Finally, by predicting employee skill growth trajectories and quantitatively aligning them with future organizational needs, it provides forward-looking guidance for talent development. This enables talent development planning to more closely serve organizational strategic goals, helps reduce the risks caused by skills mismatch, and improves the accuracy of talent decisions.
[0067] Example 2
[0068] This embodiment is an explanation based on Embodiment 1. Please refer to it. Figures 1 to 2 Specifically, static profile data should include at least one or more of the following: education, length of service, job level, and professional certifications; individual historical data should include at least one or more of the following: performance rating sequence, training course completion records, project participation history, and promotion records.
[0069] Build a machine learning model, which is an autoencoder network model;
[0070] Before the machine learning model encodes static archival data and individual historical data, a preprocessing step is also included: one-hot encoding is performed on categorical data in the static archival data, including education level and job grade; and normalization or standardization is performed on numerical data in the individual historical data, including tenure and performance rating, in order to convert all data features into numerical inputs with uniform dimensions to form an individual basic potential vector.
[0071] Specific data example: Defining encoding rules:
[0072] Educational background codes: {Bachelor's: [1,0,0], Master's: [0,1,0], Doctoral: [0,0,1]};
[0073] Job level codes: {Engineer (T1): [1,0,0], Senior Engineer (T2): [0,1,0], Technical Expert (T3): [0,0,1]};
[0074] Professional certification code: {None: [0], Yes: [1]};
[0075] Performance coding: {C: 1, B: 2, B+: 3, A: 4, S: 5};
[0076] Execution code:
[0077] Zhang San's educational background is "Master's" -> [0,1,0];
[0078] Zhang San's job level is "Technical Expert (T3)" -> [0,0,1];
[0079] Zhang San's professional certification "has" -> [1];
[0080] Zhang San's performance rating: "[A,B+,A]" -> "[4,3,4]"
[0081] Performing Min-Max-Normalization: Suppose we know the range of the following features based on data from all employees in the organization:
[0082] Tenure range: [0,20] years; Number of training courses range: [0,50] courses; Number of key projects range: [0,10] projects; Performance score range: [1,5];
[0083] Calculate Zhang San's normalized value:
[0084] Company tenure: (4-0) / (20-0)=0.20;
[0085] Number of trainees: (8-0) / (50-0)=0.16;
[0086] Number of items: (3-0) / (10-0)=0.30;
[0087] Performance sequence: [(4-1) / 4,(3-1) / 4,(4-1) / 4]->[075,050,075];
[0088] The final input vector X is concatenated by piecing together all the processed values in a predetermined order to form a high-dimensional input vector. X = [0,1,0,0,0,1,1,0.20,0.16,03.0,0.75,0.50,0.75]. This input vector X has 13 dimensions: 3 (education level) + 3 (job level) + 1 (certification) + 1 (tenure) + 1 (training) + 1 (project) + 3 (performance).
[0089] Step 3: Computation is performed using a pre-trained autoencoder (corresponding to claims 4 and 5).
[0090] Now, we input this 13-dimensional vector X into the encoder part of a pre-trained autoencoder model. The function of this encoder is to compress the high-dimensional input into a low-dimensional, information-condensed vector.
[0091] Model settings:
[0092] Input layer dimension: 13;
[0093] Bottleneck layer dimension (i.e., potential vector dimension): 4;
[0094] The encoder part can be simplified as a weight matrix W (4x13) and a bias vector b (4x1). The specific values in W and b are obtained by the model through learning from data from thousands of employees, and they represent the most important patterns in the data.
[0095] Calculation process (forward propagation): The formula for calculating the individual's basic potential vector V is: V = f(W × X^T + b) (where X^T is the transpose of X, and f is the activation function. For the sake of simplicity, it is temporarily regarded as a linear function, i.e., f(x) = x).
[0096] Example parameters (for demonstration purposes only): Set the learned weight matrix W and bias vector b as follows:
[0097] W = [[0.1, -0.5, 0.8, 0.2, -0.3, 0.9, 0.7, 0.6, 0.4, 0.8, 0.9, 0.5, 0.9], 2[0.7, 0.2, -0.1, 0.8, 0.5, -0.2, 0.1, 0.3, 0.1, -0.2, 0.8, 0.9, 0.8], 3[-0.2, 0.8, 0.1, -0.5, 0.9, 0.3, -0.1, 0.5, 0.7, 0.2, -0.4, -0.6, -0.4], 4[0.5, 0.3, 0.2, 0.6, 0.4, 0.1, 0.8, -0.8, -0.2, 0.1, 0.3, 0.2, 0.3]]
[0098] b = [[0.1], 7[-0.2], 8[0.3], 9[0.05]];
[0099] Perform the calculation: Multiply the input vector X of Zhang San with the weight matrix W in matrix multiplication, and then add the bias b. V1 = (0.1×0 + -0.5×1 +... + 0.9×0.75) + 0.1 = 2.455;
[0100] V2 = (0.7×0 + 0.2×1 +... + 0.8×0.75) + -0.2 = 1.696;
[0101] V3 = (-0.2×0 + 0.8×1 +... + -0.4×0.75) + 0.3 = 1.052;
[0102] V4 = (0.5×0 + 0.3×1 +... + 0.3×0.75) + 0.05 = 0.847;
[0103] Step 4: Generate the final result
[0104] After the calculation is completed, we obtain the individual basic potential vector V of the employee "Zhang San":
[0105] V_ZhangSan = [2.455, 1.696, 1.052, 0.847];
[0106] This 4-dimensional dense vector is a highly concentrated and abstract representation of all 13 original features of Zhang San. Each of its dimensions contains complex relationships learned by the model from a large amount of data. For example, a higher value of V1 may represent the combined trait of "senior and excellent performance". This vector will be used as the input for subsequent more complex analyses (such as network analysis S2), and its information quality is much higher than the original data.
[0107] In this embodiment, the technical principle of step S1 of the present invention lies in using an autoencoder, a machine learning model, to perform deep feature learning and information compression on carefully preprocessed multi-source heterogeneous employee data. The core process is as follows: First, through preprocessing methods such as one-hot encoding and normalization, the original data of different types and dimensions, including static employee records (such as education level and job title) and dynamic history (such as performance and length of service), are transformed into a unified, standardized multi-dimensional numerical vector. Then, this high-dimensional vector is input into a pre-trained autoencoder model. The encoder part of this model, through its internally learned weight matrix, performs nonlinear transformation and dimensionality reduction on the input vector, ultimately outputting a low-dimensional, dense "individual basic potential vector."
[0108] This embodiment achieves effective fusion and objective quantification of multi-source heterogeneous data: traditional methods struggle to comprehensively process categorical, numerical, and sequential data. This method integrates all discrete, unstructured information into a unified digital framework through unified vectorization preprocessing. The autoencoder automatically discovers and determines the intrinsic relationships and relative importance between features during the learning process, avoiding the subjectivity of manually setting weights, thus constructing a more comprehensive and objective digital profile for each employee. The original high-dimensional input vector may contain a large amount of redundant information or noise. The training goal of the autoencoder is to reconstruct the original input as perfectly as possible from the compressed vector, which forces the encoder to learn the most representative and essential features in the data. Therefore, the generated low-dimensional potential vector is a highly condensed and refined essence of the original information, with an information density far higher than the original data, providing high-quality input for subsequent complex analysis. Step S1 maps each employee to the same low-dimensional potential feature space. In this space, each employee is represented as a vector with the same dimension. This not only facilitates subsequent model processing but, more importantly, establishes a unified and standardized comparison benchmark for all employees. Organizations can use these vectors to perform similarity calculations, cluster analysis, and other operations, thereby conducting scientific talent assessment and comparison under a unified set of metrics.
[0109] Example 3
[0110] This embodiment is an explanation based on Embodiment 1. Please refer to it. Figures 1 to 2 Specifically, interactive metadata includes at least one or more of the following: email metadata, instant messaging metadata, code repository collaboration records, and shared document editing history.
[0111] Organizational structure data defines the hierarchical structure of formal reporting relationships among employees within an organization;
[0112] The specific steps involved in constructing a multimodal organizational network diagram include:
[0113] Map each employee within the organization to a unique node i in the graph;
[0114] Based on organizational structure data, a first type of directed edge is established between employee nodes with direct reporting relationships. The direction of the first type of directed edge is determined by the reporting relationship.
[0115] Based on interaction metadata, the frequency or duration of interactions between any two employee nodes i and j that have engaged in collaborative interactions is counted to obtain interaction statistics. When the interaction statistics exceed a preset interaction threshold, a second type of directed edge is established between nodes i and j, and a weight value W is assigned to the second type of directed edge. ij W ij This is a normalized representation of the statistical value.
[0116] The specific steps for learning and generating the network fusion latent vector for the i-th employee node using the graph embedding model are as follows:
[0117] The graph embedding model is determined to be a graph neural network (GNN) model, and the individual basic potential vector generated for the i-th employee in S1 is used as the initial node feature of the corresponding node i in the GNN model, denoted as . ;
[0118] The node representation is updated by performing K iterations on each node using a graph neural network (GNN) model. The calculation of the k-th iteration (1≤k≤K) follows the formula:
[0119] ;
[0120] in, Let N(i) be the node representation of node i after the k-th iteration, and let N(i) be the set of neighboring nodes of node i. Let be the aggregation function for the k-th round, used to aggregate the feature information of neighboring nodes. This is the update function for the k-th round, used to combine its own features from the previous round with the aggregated neighbor features; it represents the vector output by the i-th employee node in the last layer after K iterations. , as a potential vector for network fusion.
[0121] Preferably, the steps for classifying and identifying corresponding groups based on network fusion latent vectors are as follows:
[0122] Represent node i after the k-th iteration. The input is fed into a pre-trained downstream classifier model, which calculates the probability of an employee at node i belonging to each predefined category using the following formula. :
[0123] ;
[0124] in, and The weight matrix and bias vector of the downstream classifier model are used; the classification score is input into the Softmax function, which converts it into a classification probability value. ;
[0125] In the category probability vector Pᵢ, the probability value corresponding to the high-potential or high-risk category exceeds the preset probability threshold P. th When the i-th employee is assigned to the corresponding group, the i-th employee will be assigned to the corresponding group.
[0126] Specific data examples:
[0127] Initial data settings:
[0128] Employee and individual basic potential vectors (from S1); The analysis is conducted on a small team of four employees:
[0129] Employee 1 (Zhang): Team Manager;
[0130] Employee 2 (Ding): Senior Engineer;
[0131] Employee 3 (Shi): Junior Engineer;
[0132] Employee 4 (Zhou): Collaborating engineer in a neighboring team;
[0133] Through step S1, we have generated a 3-dimensional individual basic potential vector hᵢ for each employee. 0 This vector encodes static data such as their education, performance, and project experience.
[0134] h1 0 (Zhang): [0.8, 0.7, 0.9] (indicating strong overall ability);
[0135] h2 0 (Ding): [0.7, 0.8, 0.6] (indicating outstanding technical skills);
[0136] h3 0 (Shi): [0.5, 0.4, 0.5] (indicating a stage of development);
[0137] h4 0 (Zhou): [0.6, 0.6, 0.7] (indicating balanced abilities);
[0138] Organizational structure data: Zhang is the direct superior of Ding and Shi.
[0139] Interaction metadata (statistical period is one month); the total number of emails and instant messages between employees was counted as the interaction statistics. Zhang ↔ Ding: 150 times; Zhang ↔ Shi: 120 times; Ding ↔ Shi: 80 times; Ding ↔ Zhou: 60 times;
[0140] Zhang ↔ Zhou: 10 times;
[0141] Preset parameters: Interaction threshold: 50 times (if the number of interactions is less than this value, it is not considered to constitute a strong collaborative relationship);
[0142] Downstream classifier categories: {High potential, core employees, high risk};
[0143] probability threshold P th 0.80;
[0144] Construct a multimodal organizational network diagram;
[0145] Create nodes: Create nodes 1, 2, 3, and 4 for Zhang, Ding, Shi, and Zhou.
[0146] Establish the first type of directed edge (reporting relationship); based on the organizational structure, establish edges from reporting superiors to subordinates:
[0147] 1>2 (Zhang -> Ding); 1>3 (Zhang -> Shi);
[0148] Establish a second type of directed edge (cooperative relationship);
[0149] Compare interaction statistics with interaction threshold (50):
[0150] Zhang-Ding (150>50): Establish edge 1↔2;
[0151] Zhang-Shi (120>50): Establish edge 1↔3;
[0152] Ding-Shi (80>50): Establish edge 2↔3;
[0153] Ding-Zhou (60>50): Establish edge 2↔4;
[0154] Zhang-Zhou (10<50): No edge is established;
[0155] Calculate the weights Wᵢⱼ (using maximum value normalization, with a maximum value of 150):
[0156] W 12 =150 / 150=1.0;
[0157] W 13 =120 / 150=0.8;
[0158] W 23=80 / 150≈0.53;
[0159] W 24 =60 / 150=0.4;
[0160] Final network graph structure: Nodes 1, 2, and 3 form a tight internal cooperative triangle, while node 2 acts as a bridge connecting to external node 4.
[0161] C. Apply the graph embedding model (taking Ding / node 2 as an example, K=1 rounds of iteration);
[0162] The initial node features will generate hᵢ from S1 0 As the initial features for the GNN, Ding's initial features are h2. 0 =[0.7,0.8,0.6].
[0163] GNN iterative computation (computation h2¹): Neighbor node set N(2): Ding’s neighbors include Zhang (1), Shi (3), and Zhou (4).
[0164] AGGREGATE (Aggregate Neighbor Information): Here, a weighted average method is used, employing collaborative weights Wᵢⱼ to aggregate the initial features hⱼ of neighbor nodes. 0 .
[0165] AGG-Neighbors=(W 21 ×h1 0 +W 23 ×h3 0 +W 24 ×h4 0 ) / (W 21 +W 23 +W 24 );
[0166] AGG-Neighbors=(1.0×[0.8,0.7,0.9]+0.53×[0.5,0.4,0.5]+0.4×[0.6,0.6,0.7]) / (1.0+0.53+0.4);
[0167] AGG-Neighbors=([0.8,0.7,0.9]+[0.265,0.212,0.265]+[0.24,0.24,0.28]) / 1.93;
[0168] AGG-Neighbors=[1.305,1.152,1.445] / 1.93≈[0.676,0.597,0.749];
[0169] This aggregate vector represents the "average profile" of Ding's collaborative environment, and it can be seen that he was significantly influenced by the high-potential manager Zhang.
[0170] COMBINE (Update Self-Representation): Updates its previous feature h2. 0 This is combined with the aggregated neighbor features AGG-Neighbors. In GNNs, this is typically achieved through a small neural network layer. To simplify the example, we assume that the update function fuses the information from both, generating a new vector h2¹.
[0171] h2¹=UpdateFunction(h2 0 ,AGG-Neighbors);
[0172] h2¹=UpdateFunction([0.7,0.8,0.6],[0.676,0.597,0.749]);
[0173] After the update, Ding's feature vector is enhanced by incorporating information from its strong neighbors (especially Zhang). A possible output result is:
[0174] h2¹=[0.75,0.78,0.70];
[0175] This new vector h2¹ is Ding's network fusion potential vector, which not only includes Ding's individual capabilities, but also incorporates his position in the organizational network and the influence of the collaborative environment.
[0176] Categorize and identify employee groups (continuing with Mr. Ding as an example);
[0177] Input downstream classifier: Input Ding's network fusion latent vector h2¹=[0.75,0.78,0.70] into the pre-trained classifier.
[0178] Calculate the class assignment probability P2; classifier internal parameters (set values):
[0179] Weight matrix Wc(3x3): [[0.9,0.2,0.8],[0.1,0.7,0.2],[0.1,0.1,0.1]];
[0180] Bias vector bc(1x3): [0.1, 0.05, 0.0];
[0181] Calculate Logits: z = h²¹ × Wc + bc;
[0182] z≈[1.43,0.74,0.22] (This is a simplified result of matrix operations);
[0183] Apply the Softmax function: P2 = Softmax(z);
[0184] P2=[e^1.43,e^0.74,e^0.22] / (e^1.43+e^0.74+e^0.22);
[0185] P2=[4.18,2.10,1.25] / (4.18+2.10+1.25);
[0186] P2 = [4.18, 2.10, 1.25] / 7.53;
[0187] P2≈[0.55,0.28,0.17];
[0188] Correction: To make the example results more consistent with expectations, the Wc and bc parameters were adjusted so that Ding was identified as having high potential. Assume the classifier parameters are as follows after training:
[0189] Wc(3x3):[[1.2,0.8,1.0],[0.1,0.5,0.2],[-0.5,-0.8,-1.0]];
[0190] bc(1x3):[0.5,0.1,-0.2];
[0191] Recalculate Logits: z = h²¹ × Wc + bc;
[0192] z=[0.75,0.78,0.70]×Wc+bc≈[2.72,0.67,-1.49];
[0193] Reapply Softmax: P2 = Softmax(z);
[0194] P2=[e^2.72,e^0.67,e^-1.49] / (e^2.72+e^0.67+e^-1.49);
[0195] P2=[15.18,1.95,0.23] / (15.18+1.95+0.23);
[0196] P2 = [15.18, 1.95, 0.23] / 17.36;
[0197] P2=[0.87,0.11,0.02];
[0198] Category probability vector P2;
[0199] P (high potential) = 0.87;
[0200] P(core employees) = 0.11;
[0201] P(high risk) = 0.02;
[0202] Grouping: The probability value of Ding's "high potential" category (0.87) is compared with the preset probability threshold P. th (0.80) is compared.
[0203] Because 0.87 > 0.80, the system categorized Ding into the "high-potential employees" group.
[0204] Conclusion: Although Ding's initial individual vector was not top-tier, this method successfully identified him as a high-potential talent by analyzing his key position in the organizational network (strong connections with managers, colleagues, and external personnel) and quantifying this network advantage. This demonstrates the superiority of this method compared to traditional static analysis.
[0205] The technical principle of step S2 of this invention lies in constructing a multimodal organizational network graph that integrates formal reporting relationships and informal collaborative relationships, and applying a graph neural network (GNN) model for deep analysis. The core process is as follows: First, employees are mapped to nodes in the graph. Organizational structure data is used to establish "reporting edges" representing hierarchical relationships, while "collaboration edges" reflecting the actual intensity of collaboration are established based on interaction metadata, forming a complex network that comprehensively depicts the internal relationships of the organization. Then, the "individual basic potential vector" generated in step S1 is used as the initial feature of each node. The GNN model, through an iterative neighbor information aggregation and update mechanism, allows each node's feature vector to absorb information from its network neighbors (i.e., collaborators and superiors / subordinates). After multiple iterations, a "network fusion potential vector" is finally generated. This vector not only contains the individual abilities of employees but, more importantly, encodes their structural position, collaboration patterns, and environmental influence within the organizational network. Finally, this fusion vector is input into a downstream classifier to accurately identify specific employee groups, such as high-potential or high-risk employees, in a probabilistic manner.
[0206] In this embodiment, the method quantifies and visualizes this implicit and dynamic collaborative relationship for the first time by constructing a multimodal network graph. This allows the analysis to move beyond "who reports to whom" and delve into "who is the actual core of the collaboration," thereby revealing the true key talents and information hubs. An employee's potential depends not only on their own qualities but also profoundly on their collaborative environment. The application of GNN elevates employee evaluation from an isolated individual level to a systemic level of "individual-environment" interaction. The generated network fusion latent vector scientifically quantifies the "one is influenced by one's surroundings" effect, identifying employees who, although their individual data may not be outstanding, possess significant development potential due to their key network positions or close collaboration with high performers, thus improving the accuracy and foresight of talent identification. Classification based on latent vectors that integrate network information provides a richer and more comprehensive basis for decision-making. When the system identifies an employee as "high-potential," it is not only because of their excellent individual performance but also because the model has discovered that they are a bridge for cross-team collaboration or a key node for knowledge dissemination. This network-based analysis provides a deeper and more persuasive explanation for talent management decisions, enabling subsequent talent development and retention strategies to be more precise and effective.
[0207] Example 4 is an explanation of Example 1. Please refer to the provided text. Figure 1 and Figure 3 Specifically, the steps for collecting and quantifying the interaction metadata sequence in step S3 include: setting a uniform time window length Δt, dividing the interaction behavior of the i-th employee in the corresponding group on the knowledge base and collaboration platform into T time windows in chronological order to form an interaction behavior sequence; within each time window t (1≤t≤T), extracting and quantifying the interaction behavior of the i-th employee, and constructing a multi-dimensional interaction feature vector. The expression is as follows:
[0208] ;
[0209] in, The number of knowledge documents created by the current employee within the time window t; The number of documents edited by the current employee; This represents the number of documents currently viewed by the employee. The number of comments and replies posted by the current employee; The number of times an employee mentions others during collaboration; the interaction feature vectors of T time windows are arranged chronologically to form an interaction metadata sequence. , recorded as .
[0210] Preferably, the time series analysis model is a Long Short-Term Memory (LSTM) network. The specific steps for inputting the interaction metadata sequence into the time series analysis model are as follows:
[0211] Interactive metadata sequence As input to the LSTM model;
[0212] The time series analysis model processes the feature vectors sequentially from 1 to T along time step t. And update its internal hidden state at each time step t. and cumulative knowledge state vector Its update process follows the formula below:
[0213] ;
[0214] in, and These are the hidden state and accumulated knowledge state vector of the previous time step, respectively; the hidden state hᵢ(T) output by the time series analysis model at the last time step T after processing the entire sequence is used as a dynamic feature representation that incorporates time series information. It is a recurrent neural network that can learn information that depends on long-term conditions; it is a long short-term memory network.
[0215] The specific steps for constructing a dynamic knowledge influence profile are as follows: based on the dynamic feature representation hᵢ(T), where hᵢ(T) is the hidden state. The final representation, where t=T, is the last time window;
[0216] Influence indicators are calculated through a pre-defined linear transformation layer to construct a dynamic knowledge influence profile Pᵢ.
[0217] Influence metrics include at least Knowledge Creation (KCSᵢ), Knowledge Dissemination (KDSᵢ), and Interaction Stability (ISSᵢ), which are calculated as follows:
[0218]
[0219]
[0220]
[0221] in, It is the weight of knowledge creation; It is the weight of knowledge dissemination. For the knowledge creation bias term, This is a bias term for knowledge dissemination. This is the variance operator, i.e., a function that calculates variance; The total number of interactions by an employee within the time window t, specifically ;
[0222] The final dynamic knowledge influence profile is denoted as: Pᵢ=[KCSᵢ,KDSᵢ,ISSᵢ].
[0223] Example of specific data: Initial data settings:
[0224] Analysis object:
[0225] Employee i: Ding (already identified as a high-potential employee in S2); Data collection period: Total duration: one month (four weeks); Time window length Δt: 1 week; Number of time windows T: 4;
[0226] Simulated raw interaction logs (within one month); Week 1 (t=1): Ding started a new project module, mainly working on creating the initial framework and reviewing relevant technical documents. Documents created: 3 (project design draft, technology selection report); Documents edited: 5 times (minor revisions); Documents reviewed: 20 times (reviewing the company's technical standards library); Comments posted: 10 times (asking questions under relevant documents); Mentions of others: 8 times (@colleague A asking questions);
[0227] Week 2 (t=2): Entering an intensive development phase, collaboration frequency increased significantly. Documentation created: 1 document (API interface documentation);
[0228] Edited documentation: 15 times (frequently updated code documentation); reviewed documentation: 15 times (viewed colleagues' interface definitions); posted comments: 12 times (code review comments); mentioned others: 15 times (@Charlie assigned tasks, @David discussed interfaces);
[0229] Week 3 (t=3): Module testing and review phase, focusing on communication and feedback. Documents created: 0; Documents edited: 8 times (modifying documents based on feedback); Documents reviewed: 25 times (viewing test reports and user feedback); Comments posted: 20 times (responding to review comments); Mentions of others: 18 times (@test team, @product manager);
[0230] Week 4 (t=4): Project Closure and Debriefing. Documents created: 2 (debriefing summary, best practice sharing); Documents edited: 10 times (finalizing the document); Documents reviewed: 18 times (reviewing the project process); Comments posted: 15 (expressing opinions in the debriefing document); Mentions of others: 12 times (@Alice reporting summary);
[0231] Construction of interactive metadata sequences;
[0232] Based on the above logs, construct a multi-dimensional interaction feature vector Xᵢᵗ for Ding within 4 time windows, and form an interaction metadata sequence X_Ding. The interaction metadata sequence is shown in Table 1 below:
[0233] Table 1: Interaction Metadata Sequence of X_Ding
[0234]
[0235] The interaction metadata sequence of Ding is as follows:
[0236] X_Ding = {[3, 5, 20, 10, 8], [1, 15, 15, 12, 15], [0, 8, 25, 20, 18], [2, 10, 18, 15, 12]}
[0237] Apply the time series analysis model (LSTM)
[0238] As the model input, input the sequence X_Ding into the pre-trained LSTM model.
[0239] Time series processing process (conceptual description): t = 1: LSTM receives X_Ding¹ = [3, 5, 20, 10, 8], updates its internal state, and generates a hidden state h_Ding¹. This state initially encodes the pattern of the "project startup period".
[0240] t = 2: LSTM receives X_Ding² = [1, 15, 15, 12, 15] and the hidden state h_Ding¹ from the previous moment, and outputs a new hidden state h_Ding². The model learns the behavior transition from "creation" to "high-frequency collaboration".
[0241] t = 3: LSTM receives X_Ding³ = [0, 8, 25, 20, 18] and h_Ding², and outputs h_Ding³. The model captures the interaction peak pattern in the "review and feedback" stage.
[0242] t = 4: LSTM receives X_Ding 4 = [2, 10, 18, 15, 12] and h_Ding³, and outputs the final hidden state h_Ding 4 .
[0243] Dynamic feature representation: After processing the entire sequence, the hidden state h_Ding output by the model at the last time step T = 4 4 is the dynamic feature representation of Ding. Assume that the hidden layer dimension of this LSTM is 2, and a predicted value of h_Ding 4 is: h_Ding 4 = [0.85, 0.92]; Explanation: This vector integrates the time series information of the entire month. The first dimension may capture the pattern related to "knowledge creation and precipitation", while the second dimension may capture the pattern related to "knowledge dissemination and collaboration intensity".
[0244] Constructing a dynamic knowledge influence portrait: preset parameters:
[0245] Weights and biases of Knowledge Creation Score (KCS): W kcs = [0.9, 0.2], b kcs = 0.1;
[0246] Weights and biases of Knowledge Dissemination Score (KDS): W kds = [0.3, 0.8], b kds = 0.15;
[0247] Calculating the influence index: calculating the Knowledge Creation Score (KCS) of Ding:
[0248] KCS of Ding = W kcs ᵀ × h of Ding 4 + b kcs
[0249] KCS of Ding = [0.9, 0.2] × [0.85, 0.92]ᵀ + 0.1
[0250] KCS of Ding = (0.9 × 0.85 + 0.2 × 0.92) + 0.1 = 0.765 + 0.184 + 0.1 = 1.049;
[0251] Interpretation: This relatively high score reflects that Ding had significant knowledge output behaviors (such as creating design drafts and summaries) during the project cycle.
[0252] Calculating the Knowledge Dissemination Score (KDS) of Ding:
[0253] KDS of Ding = W kds ᵀ × h of Ding 4 + b kds ;
[0254] KDS of Ding = [0.3, 0.8] × [0.85, 0.92]ᵀ + 0.15;
[0255] KDS of Ding = (0.3 × 0.85 + 0.8 × 0.92) + 0.15 = 0.255 + 0.736 + 0.15 = 1.141;
[0256] Interpretation: This very high score reflects that Ding played the role of an information hub in the collaboration network, and his interaction behaviors (frequent mentions and comments) promoted the flow of knowledge.
[0257] Calculating the Interaction Stability Score (ISS) of Ding: First, obtain the total interaction frequency sequence: [46, 58, 71, 57]
[0258] Calculating the variance Var of this sequence:
[0259] The mean μ = (46 + 58 + 71 + 57) / 4 = 58
[0260] Var = [(46 - 58)² + (58 - 58)² + (71 - 58)² + (57 - 58)²] / 4;
[0261] Var = [(-12)² + 0² + 13² + (-1)²] / 4 = [144 + 0 + 169 + 1] / 4 = 314 / 4 = 78.5;
[0262] ISS Ding = 1 / (1 + Var) = 1 / (1 + 78.5) = 1 / 79.5 ≈ 0.0126;
[0263] Interpretation: This score is very low, indicating that Ding's interaction activity fluctuates greatly between different periods (from 46 to 71), and his work pattern shows a typical project periodicity rather than a stable daily maintenance state.
[0264] Final portrait: Combine the three calculated indicators to form Ding's dynamic knowledge influence portrait P_Ding:
[0265] P_Ding = [KCS_Ding, KDS_Ding, ISS_Ding];
[0266] P_Ding = [1.049, 1.141, 0.0126];
[0267] Conclusion: This example clearly demonstrates how to transform the seemingly chaotic interaction behaviors of an employee over a period of time into a structured and quantifiable influence portrait containing three dimensions of "creation", "dissemination", and "stability" through quantification and time series modeling. This portrait not only reveals Ding's high influence as the core of knowledge dissemination but also points out the periodic fluctuation characteristics of his work pattern, providing profound data insights for subsequent talent management and development.
[0268] Example 5. This example is an explanatory illustration carried out in Example 1. Please refer to Figure 1 and Figure 3 ; Specifically, the steps for collecting and quantifying the historical data of employees' skill acquisition are as follows: Define an organizational skill space containing M key skills; at discrete time points k (k = 1, 2,..., K), evaluate the proficiency of the i-th employee in M key skills to form the skill state vector S_i(k) of the employee at time point k, which is expressed as:
[0269] S_i(k) = [s_i1(k), s_i2(k),..., s_i m (k)]; [[ID=z8]]
[0270] Where sᵢf(k) is the quantitative proficiency score of employee i at time point k on the f-th skill, where 1≤f≤M; the skill state vectors at K time points are arranged in chronological order to form a multidimensional skill chronological vector sequence {Sᵢ(k)}. k=1 ᴷ;
[0271] Based on the project set of the organization's future planning, all key skills required to complete these projects are identified and mapped to an organizational skill space of M key skills. For each key skill f, a target demand level df is set according to its importance, urgency, and frequency of demand in future projects, thereby constructing a target skill demand vector D, represented as: D=[d1,d2,...,d...]. m ]; where vector D is the target benchmark for alignment analysis;
[0272] The specific steps for generating a future skill development trajectory are as follows:
[0273] A multidimensional skill time-series vector sequence is input into a time-series prediction model. The time-series prediction model analyzes the historical proficiency data of each skill in the multidimensional skill time-series vector sequence to learn its inherent trends, periodicity, and autocorrelation. Based on the learned patterns, the time-series prediction model iteratively predicts the proficiency of each skill at multiple preset time steps in the future, and combines the predicted values of each skill into a future skill state vector sequence, which constitutes the future skill growth trajectory.
[0274] At each future time step, the predicted skill state vector and the target skill requirement vector are compared item by item to calculate the difference between the predicted proficiency of each key skill and the target requirement level. Based on the preset strategic importance weights of each skill, the differences are weighted and aggregated to generate a skill gap score that can quantitatively represent the gap between the employee and the organization's overall future needs at each future time step.
[0275] The generated personalized skills growth prediction map is a structured dataset that integrates the following information: the employee's unique identifier, their complete skills acquisition history data sequence, the organization-level target skills demand vector, the dynamic knowledge influence profile Pᵢ, the future skills growth trajectory generated by the time-series prediction model, and the skills gap score sequence that changes over time and corresponds to the future skills growth trajectory.
[0276] Specific data example: We will continue to use employee Ding as an example to demonstrate how to generate a personalized skills growth prediction map for him.
[0277] Example data for step S4: Initial data settings:
[0278] Organizational Skill Space (M=4): For simplicity, an organizational skill space is defined that contains 4 key skills.
[0279] f=1: Cloud-Native Architecture;
[0280] f=2: Machine Learning Engineering;
[0281] f=3: Data Security and Compliance;
[0282] f=4: Agile Project Leadership;
[0283] 2. Ding's skill acquisition history data (K=4 quarters);
[0284] We collected Ding's skill proficiency scores (on a scale of 1-10) over the past year (four quarters), forming a multidimensional skill time-series vector sequence {S_Ding(k)}. The following multidimensional skill time-series vector sequence is shown in Table 2:
[0285] Table 2: Multidimensional Skill Temporal Vector Sequence
[0286] Time point (k) Skill 1 (Cloud Native) Skill 2 (Machine Learning) Skill 3 (Data Security) Skill 4 (Leadership) Q1(k=1) 3 7 4 5 Q2(k=2) 5 8 5 6 Q3(k=3) 6 8 6 7 Q4(k=4) 7 9 6 8
[0287] Data Interpretation: Ding is an expert in "machine learning" and has significantly improved his "cloud-native" and "leadership" skills over the past year. However, his "data security" skills have grown relatively slowly and remain at a low level.
[0288] Organizational future skills requirements:
[0289] Future plans: The organization plans to launch a strategic project called "Smart Financial Risk Control Platform" within the next six months.
[0290] Target skill requirement vector D: Based on project requirements, the target requirement levels for the above four skills are set as follows:
[0291] D=[9,9,8,9];
[0292] Interpretation: This project requires expert-level cloud-native, machine learning, and leadership skills, as well as advanced data security capabilities.
[0293] Strategic importance weight W:
[0294] W=[0.3,0.3,0.2,0.2];
[0295] Interpretation: Cloud native and machine learning are the core technologies of the project with the highest weights; data security and leadership are the key supports with slightly lower weights.
[0296] Generate the future skill growth trajectory:
[0297] Time series prediction model analysis: Input Ding's historical skill sequence {S_Ding(k)} into the model.
[0298] The model identified that: Skills 1 and 4 show a strong linear growth trend. Skill 2 is close to full marks and the growth trend has slowed down. Skill 3 has stagnated in growth in the last two quarters.
[0299] Predict the skill status in the next two quarters (Q5, Q6):
[0300] Based on the learned trends, the model generated Ding's future skill growth trajectory as shown in Table 3 below:
[0301] Table 3: Ding's future skill growth trajectory
[0302] Time point (t) Skill 1 (Prediction) Skill 2 (Prediction) Skill 3 (Prediction) Skill 4 (Prediction) Predict the skill state vector S_ding(t) Q5(t=5) 8.0 9.2 6.5 9.0 [8.0,9.2,6.5,9.0] Q6(t=6) 9.0 9.3 6.8 9.5 [9.0,9.3,6.8,9.5]
[0303] Quantitative alignment analysis and calculation of skill gaps; Calculate the skill gap at Q5 (t = 5):
[0304] Item-by-item difference (prediction - target):
[0305] GapVector(5) = S_Ding(5) - D = [8.0, 9.2, 6.5, 9.0] - [9, 9, 8, 9] = [-1.0, +0.2, -1.5, 0.0];
[0306] Weighted aggregation to generate the skill gap score:
[0307] Score(5) = W · GapVector(5)ᵀ = 0.3*(-1.0) + 0.3*(+0.2) + 0.2*(-1.5) + 0.2*(0.0) = -0.3 + 0.06 - 0.3 + 0 = -0.54;
[0308] Interpretation: A negative score indicates that at Q5, there is an obvious gap between Ding's overall skill level and the future requirements of the organization. The largest gaps come from "data security" (-1.5) and "cloud native" (-1.0).
[0309] Calculate the skill gap at Q6 (t = 6): Item-by-item difference (prediction - target);
[0310] GapVector(6) = S_Ding(6) - D = [9.0, 9.3, 6.8, 9.5] - [9, 9, 8, 9] = [0.0, +0.3, -1.2, +0.5];
[0311] Weighted aggregation generates skill gap score:
[0312] Score(6)=W·GapVector(6)ᵀ=0.3*(0.0)+0.3*(+0.3)+0.2*(-1.2)+0.2*(+0.5)=0+0.09-0.24+0.1=-0.05;
[0313] Analysis: By Q6, the gap score had approached zero, indicating that the gap was narrowing. However, "data security" skills remain a core weakness.
[0314] The system generates a personalized skills growth prediction map; ultimately, it generates a structured dataset for Ding, namely the personalized skills growth prediction map, the core content of which is as follows:
[0315] Employee unique identifier: Ding; Skills acquisition history data sequence: data in Table A.2.
[0316] Organizational target skill demand vector: D=[9,9,8,9];
[0317] Dynamic knowledge influence profile (from S3): P_Dingmou = [1.049, 1.141, 0.0126] (high creativity, high dissemination, low stability);
[0318] Future skills development trajectory: Predicted data in Table 3.
[0319] Skill gap score sequence: [-0.54 (Q5), -0.05 (Q6)];
[0320] This example clearly demonstrates how to quantitatively align an employee's historical skills data with the organization's future needs. The generated profile not only predicts Ding's natural growth path but also accurately identifies "data security and compliance" as a key bottleneck in meeting the needs of future strategic projects. Combined with his high knowledge dissemination influence profile, the organization can develop a highly personalized development plan for Ding: for example, arranging for him to participate in advanced data security certification training and appointing him as a knowledge sharing officer in this field, leveraging his influence to quickly improve the relevant capabilities of the entire team, thereby perfectly combining personal development with organizational strategic goals.
[0321] In this embodiment, the method collects interactive metadata within a continuous time window and applies a time series analysis model (such as LSTM) to achieve continuous and dynamic evaluation of employee contributions. This compensates for the shortcomings of traditional periodic performance appraisals in terms of timeliness and granularity, and can capture changes in employee behavior patterns at different work stages (such as project initiation, development, and completion), thereby more realistically reflecting their work rhythm and contribution methods.
[0322] Secondly, this method outputs not a single evaluation value, but a structured profile encompassing multiple dimensions such as knowledge creation, dissemination, and interaction stability. This multi-dimensional analysis provides deeper insights, effectively distinguishing between different types of knowledge contributors, such as identifying "knowledge producers" who quietly create core documents and "knowledge disseminators" who actively engage in communication and collaboration. This fine-grained differentiation is difficult to achieve with traditional methods.
[0323] Finally, this dynamic knowledge influence profile provides refined data support for talent management. By analyzing the profile, organizations can more accurately identify tacit experts in specific fields, assess the actual role of employees in the team's knowledge ecosystem, and develop more targeted development and incentive strategies for them, thereby promoting the effective accumulation and flow of knowledge within the organization.
[0324] The threshold is set to facilitate comparison. The size of the threshold depends on the amount of sample data and the number of bases set by those skilled in the art for each set of sample data; as long as it does not affect the ratio between the parameter and the quantized value, it is acceptable.
[0325] The above formulas are all derived from software simulations using a large amount of data, and are selected to be close to the true values. The coefficients in the formulas are set by those skilled in the art based on actual conditions. The above description is only a preferred embodiment of the present invention, but the scope of protection of the present invention is not limited thereto. Any equivalent substitutions or modifications made by those skilled in the art within the technical scope disclosed in the present invention, based on the technical solution and inventive concept of the present invention, should be covered within the scope of protection of the present invention. It is a recurrent neural network capable of learning long-term dependent information, and is a long short-term memory network.
Claims
1. A human resource intelligent analysis method based on machine learning from multi-source data, characterized in that, Includes the following steps: S1. Obtain the static file data and individual historical data of the i-th employee in the organization, construct a machine learning model to encode the data, and generate an individual basic potential vector; S2. Obtain the interaction metadata of the internal collaboration network of the organization, and construct a multimodal organizational network graph with the i-th employee as the node and the collaboration relationship as the edge based on the organizational structure data and the interaction metadata. The graph embedding model is applied to the multimodal organizational network graph to learn and generate a network fusion latent vector that integrates its network topology and neighborhood cooperation information for the i-th employee node. Based on the network fusion latent vector, the model is iterated and a class probability vector Pᵢ is constructed to classify the i-th employee into the corresponding group. S3. For the corresponding group, collect the interaction metadata sequence of the group in the knowledge base and collaboration platform that changes over time; input the interaction metadata sequence into the time series analysis model, and construct a dynamic knowledge influence profile by learning its interaction patterns and evolution trends in the information flow network. S4. For the corresponding group, collect the historical data sequence of skill acquisition and the skill requirement data of the organization's future project planning; input the historical data sequence of skill acquisition into the time series prediction model to generate the future skill growth trajectory, and perform alignment analysis between the future skill growth trajectory and the skill requirement data to generate a personalized skill growth prediction map for guiding talent development.
2. The human resource intelligent analysis method based on machine learning using multi-source data according to claim 1, characterized in that, The static profile data includes at least one or more of the following: education level, length of service, job rank, and professional certifications; the individual historical data includes at least one or more of the following: performance rating sequence, training course completion records, project participation history, and promotion records. Construct a machine learning model, which is an autoencoder network model; Before the machine learning model encodes static archival data and individual historical data, a preprocessing step is included: encoding the categorical data of the static archival data, including education level and job grade; and normalizing or standardizing the numerical data in the individual historical data, including tenure and performance rating, to convert all data features into numerical inputs with uniform dimensions, forming an individual basic potential vector.
3. The human resource intelligent analysis method based on machine learning using multi-source data according to claim 1, characterized in that, The interactive metadata includes at least one or more of the following: email metadata, instant messaging metadata, code repository collaboration records, and shared document editing history. The organizational structure data is hierarchical data that defines the formal reporting relationships between employees within the organization; The steps for constructing the multimodal organizational network diagram specifically include: Map each employee within the organization to a unique node i in the graph; Based on the organizational structure data, a first type of directed edge is established between employee nodes with direct reporting relationships, and the direction of the first type of directed edge is determined by the reporting relationship. Based on the aforementioned interaction metadata, the frequency or duration of interactions between any two employee nodes i and j that have engaged in collaborative interactions is statistically analyzed to obtain interaction statistics. When the interaction statistics exceed a preset interaction threshold, a second type of directed edge is established between nodes i and j, and a weight value W is assigned to the second type of directed edge. ij W ij This is the normalized representation of the statistical value.
4. The human resource intelligent analysis method based on machine learning using multi-source data according to claim 1, characterized in that, The specific steps for learning and generating the network fusion latent vector for the i-th employee node using the graph embedding model are as follows: The graph embedding model is determined to be a graph neural network (GNN) model, and the individual basic potential vector generated for the i-th employee in S1 is used as the initial node feature of the corresponding node i in the GNN model, denoted as . ; The node representation is updated by performing K iterations on each node using a graph neural network (GNN) model. The calculation of the k-th iteration (1≤k≤K) follows the formula: ; in, Let N(i) be the node representation of node i after the k-th iteration, and let N(i) be the set of neighboring nodes of node i. Let be the aggregation function for the k-th round, used to aggregate the feature information of neighboring nodes. This is the update function for the k-th round, used to combine its own features from the previous round with the aggregated neighbor features; it represents the vector output by the i-th employee node in the last layer after K iterations. , as a potential vector for network fusion.
5. The human resource intelligent analysis method based on machine learning using multi-source data according to claim 1, characterized in that, The specific steps for classifying and identifying corresponding groups based on the network fusion potential vector are as follows: Represent node i after the k-th iteration. The input is fed into a pre-trained downstream classifier model, which calculates the probability value of the employee at node i belonging to each predefined category using the following formula. : ; in, and The weight matrix and bias vector of the downstream classifier model are used; the classification score is input into the Softmax function, which converts it into a classification probability value. ; In the category probability vector Pᵢ, the probability value corresponding to the high-potential or high-risk category exceeds a preset probability threshold P. th When the i-th employee is assigned to the corresponding group, the i-th employee will be assigned to the corresponding group.
6. The human resource intelligent analysis method based on machine learning using multi-source data according to claim 1, characterized in that, The specific steps for collecting and quantifying the interaction metadata sequence in S3 include: setting a uniform time window length Δt, dividing the interaction behavior of the i-th employee in the corresponding group on the knowledge base and collaboration platform into T time windows in chronological order to form an interaction behavior sequence; within each time window t (1≤t≤T), extracting and quantifying the interaction behavior of the i-th employee, and constructing a multi-dimensional interaction feature vector. The expression is as follows: ; in, The number of knowledge documents created by the current employee within the time window t; The number of documents edited by the current employee; This represents the number of documents currently viewed by the employee. The number of comments and replies posted by the current employee; The number of times an employee mentions others during collaboration; the interaction feature vectors of T time windows are arranged chronologically to form an interaction metadata sequence. , recorded as .
7. The human resource intelligent analysis method based on machine learning from multi-source data according to claim 1, characterized in that, The time series analysis model is a long short-term memory network. The specific steps for inputting the interaction metadata sequence into the time series analysis model are as follows: The time series analysis model processes the feature vectors sequentially from 1 to T along time step t. And update its internal hidden state at each time step t. and cumulative knowledge state vector Its update process follows the formula below: ; in, and These are the hidden state and accumulated knowledge state vector of the previous time step, respectively; the hidden state hᵢ(T) output by the time series analysis model at the last time step T after processing the entire sequence is used as a dynamic feature representation that incorporates time series information.
8. The human resource intelligent analysis method based on machine learning using multi-source data according to claim 7, characterized in that, The specific steps for constructing a dynamic knowledge influence profile are as follows: based on the dynamic feature representation hᵢ(T), hᵢ(T) is the hidden state. The final representation, where t=T, is the last time window; Influence indicators are calculated through a pre-defined linear transformation layer to construct a dynamic knowledge influence profile Pᵢ. The influence indicators include at least Knowledge Creation (KCSᵢ), Knowledge Dissemination (KDSᵢ), and Interaction Stability (ISSᵢ), and their calculation method is as follows: in, It is the weight of knowledge creation; It is the weight of knowledge dissemination. For the knowledge creation bias term, This is a bias term for knowledge dissemination. This is the variance operator, i.e., a function that calculates variance; The total number of interactions by an employee within the time window t, specifically ; The final dynamic knowledge influence profile is denoted as: Pᵢ=[KCSᵢ,KDSᵢ,ISSᵢ].
9. A human resource intelligent analysis method based on machine learning using multi-source data according to claim 8, characterized in that, The specific steps for collecting and quantifying historical data on employee skill acquisition are as follows: Define an organizational skill space containing M key skills; at discrete time points k, evaluate the proficiency of the i-th employee in the M key skills, forming the current employee's skill state vector Sᵢ(k) at time point k, represented as: Sᵢ(k)=[sᵢ1(k),sᵢ2(k),...,sᵢ m (k)]; Where sᵢf(k) is the quantitative proficiency score of employee i at time point k on the f-th skill, where 1≤f≤M; the skill state vectors at K time points are arranged in chronological order to form a multidimensional skill chronological vector sequence {Sᵢ(k)}. k=1 ᴷ; Based on the project set of the organization's future planning, all key skills required to complete these projects are identified and mapped to the organizational skill space of the M key skills. For each key skill f, a target demand level df is set according to its importance, urgency, and demand frequency in future projects, thereby constructing a target skill demand vector D, represented as: D=[d1,d2,...,d...]. m ]; where vector D is the target reference for the alignment analysis; The specific steps for generating a future skill development trajectory are as follows: A multidimensional skill time-series vector sequence is input into a time-series prediction model. The time-series prediction model analyzes the historical proficiency data of each skill in the multidimensional skill time-series vector sequence to learn its inherent trends, periodicity, and autocorrelation. Based on the learned patterns, the time-series prediction model iteratively predicts the proficiency of each skill at multiple preset time steps in the future, and combines the predicted values of each skill into a future skill state vector sequence to form a future skill growth trajectory.
10. A human resource intelligent analysis method based on machine learning using multi-source data according to claim 9, characterized in that, At each future time step, the predicted skill state vector and the target skill requirement vector are compared item by item to calculate the difference between the predicted proficiency and the target requirement level of each key skill. Based on the pre-defined strategic importance weights of each skill, the differences are weighted and aggregated to generate a skill gap score that can quantify the gap between the employee and the organization's overall future needs at each future time step. The generated personalized skills growth prediction map is a structured dataset that integrates the following information: the employee's unique identifier, their complete skills acquisition history data sequence, the organization-level target skills demand vector, the dynamic knowledge influence profile Pᵢ, the future skills growth trajectory generated by the time-series prediction model, and the skills gap score sequence that changes over time and corresponds to the future skills growth trajectory.