AI-based enterprise personalized workplace learning system and method
By using a personalized workplace learning system, the problems of disconnect between the learning system and corporate strategy, limited content production, and limited recommendation dimensions have been solved. This has enabled a deep integration of training content with corporate strategy and dynamic adaptation of the learning system, thus building a self-evolving learning ecosystem.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- CITIC UNITED CLOUD TECH CO LTD
- Filing Date
- 2026-04-03
- Publication Date
- 2026-06-30
AI Technical Summary
Existing AI-based enterprise workplace learning systems suffer from several shortcomings, including a disconnect between the learning system and corporate strategy, content production limited by existing knowledge resources, a single recommendation dimension lacking enterprise-level context, and system updates lagging behind business changes. These shortcomings make it difficult to deeply integrate corporate strategic characteristics, introduce high-quality external knowledge assets, and achieve dynamic perception and accurate recommendations.
A personalized workplace learning system for enterprises is constructed. The system collects multi-source data and performs semantic analysis through an enterprise profiling module to generate a digital profile of the enterprise. The system automatically infers a personalized learning system using a dynamic learning system generation module. The system combines multimodal content extraction and AIGC production modules to locate copyrighted content and reconstruct knowledge units. A personalized recommendation engine is used for personalized push notifications.
This ensures that training content is deeply aligned with the company's strategic direction, and that the learning system evolves dynamically with the company's development. It breaks through the limitations of traditional recommendation systems, builds a self-evolving learning ecosystem, and ensures that training and business develop in tandem.
Smart Images

Figure CN122309848A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the fields of data processing and artificial intelligence technology, and more specifically, to an AI-based personalized workplace learning system and method for enterprises. Background Technology
[0002] Currently, with the rapid development of artificial intelligence technology, AI-based enterprise workplace learning systems have become an important tool for talent development. These systems typically integrate technologies such as large-scale models, speech recognition, natural language processing, and big data analytics to provide digital training services for enterprises. Existing technical solutions mainly revolve around the following aspects:
[0003] At the system construction level, existing systems generally adopt a training framework based on job competency models. For example, Zhongke Soft's intelligent training solution is driven by competency models. By mining business data, it identifies high-potential outstanding personnel, forms profile models, and builds a recommendation system based on these profiles. Beisen Cool Academy's AI Learning platform has established a data platform based on job competency standards, combining competency and assessment technologies to define job competency requirements. Platforms such as Zhixue Cloud and Huawei Cloud Education Solutions, through big data analysis of benchmark companies in 14 major industries, have established assessment standards for key job functions and intelligently match learning programs based on employee assessment results.
[0004] In terms of content production, existing technologies generally employ AIGC-assisted course creation. Beisen's AI course assistant can generate structured outlines, PPT slides, and transcripts with a single click based on reference documents, and uses digital humans to record courses while simultaneously generating question banks and exam papers. Zhongke Software's solution utilizes large-scale model text understanding and generation capabilities to automatically generate exam questions, and leverages role-playing capabilities to simulate clients providing online tutoring for students. Uplimit's AI assistant suite promotes practical learning through AI role-playing and personalized feedback, with clients reporting a 95% or higher improvement in course creation efficiency. Huawei Cloud's solution relies on an AI-native application engine to accurately extract knowledge points from video, audio, and voice courseware, converting them into fragmented knowledge content and setting corresponding tags to form a knowledge graph.
[0005] At the personalized recommendation level, existing systems generally adopt a technical approach that combines user profiling with recommendation algorithms. Platforms such as Litang iHR utilize machine learning algorithms (such as clustering, collaborative filtering, and content recommendation) to analyze employee characteristics, forming dynamic, multi-dimensional user profiles. They then integrate internal and external training resources and tagged them, performing precise matching and push notifications based on user profiles and resource tags. Beisen AI Learning Assistant can recommend the latest courses and courses from learning plans to employees based on their roles, achieving personalized learning tailored to each individual. Platforms such as Tencent Lexiang use large models combined with RAG technology to achieve multimodal interaction and AI question bank generation. Some solutions also introduce reinforcement learning mechanisms to optimize recommendation strategies in real time based on employee feedback and learning outcomes.
[0006] However, the aforementioned AI-based enterprise workplace learning systems and methods still have the following shortcomings in practical applications:
[0007] First, the learning system is disconnected from corporate strategy. While existing systems have established training frameworks based on job competency models, their construction relies primarily on pre-set job standards and historical data mining, lacking in-depth integration of dynamic and personalized information such as corporate strategic development direction, core business pain points, and corporate culture characteristics. This "standardized job model" is difficult to truly adapt to the personalized needs of companies at different stages of development and in different industry backgrounds. As reported by 36Kr, traditional knowledge bases only solve the problems of "storage" and "retrieval," not the problem of "growth." When corporate knowledge such as salary systems and performance evaluation standards change daily, if it cannot be entered into the database, even the smartest AI can only retrieve "yesterday's map."
[0008] Second, content production remains constrained by existing knowledge resources. While current technologies have automated course production using AIGC (AI-Generated Content Creation), the content primarily relies on existing materials such as documents, product manuals, and case studies uploaded internally by companies. When companies need to introduce authoritative external knowledge (such as cutting-edge industry theories, benchmark practices, and professional methodologies), they still need to rely on manual screening and integration. This "self-production and self-consumption" content production model results in learning content often being limited to the company's existing experience, making it difficult to introduce high-quality external knowledge assets for cross-industry integration and innovative transformation.
[0009] Third, the recommendation dimensions are too narrow and lack enterprise-level context. While existing recommendation systems achieve personalized push notifications based on employee profiles, their recommendation dimensions mainly focus on the individual level—job title, job level, historical learning records, interests, and preferences. The system lacks a holistic understanding of the company's current strategic priorities, business development directions, and organizational capability weaknesses, resulting in recommended courses that may match individual employee interests but not necessarily the company's most pressing talent development needs.
[0010] Fourth, the system and content updates lag behind business changes. Once established, the learning framework of existing systems often remains static, and content updates largely rely on manual triggering. When a company adjusts its business direction, launches new products, or changes its market strategy, the learning system struggles to respond promptly, resulting in "training following the business" rather than "training leading business development." McKinsey research points out that generative AI, once integrated with business processes, can unleash enormous productivity for the global economy, but this requires continuous updates to the underlying knowledge and seamless integration with permissions and compliance; otherwise, even the fastest information will be inaccurate.
[0011] In conclusion, how to build a workplace learning system that can deeply integrate corporate strategic characteristics, introduce high-quality external knowledge assets, achieve dynamic perception and accurate recommendation of enterprise-level context, and has automatic updating capabilities has become a technical problem that urgently needs to be solved in this field. Summary of the Invention
[0012] To address the shortcomings of existing technologies, this invention discloses an AI-based personalized workplace learning system for enterprises, the technical solution of which is as follows:
[0013] A personalized workplace learning system for enterprises based on AI, characterized in that it includes:
[0014] The enterprise profile building module collects multi-source data from both internal and external sources and performs semantic analysis using an AI model to construct a digital profile representing the enterprise's industry characteristics, core business, and cultural features. The dynamic learning system generation module, connected to the enterprise profile building module, inputs the enterprise digital profile into a large language model to automatically generate a personalized learning system that matches the enterprise's strategic development path. The multimodal content extraction and AIGC production module, connected to a pre-set copyrighted content database, locates copyrighted content based on the needs of the learning system using retrieval enhancement generation technology and reconstructs the copyrighted content into knowledge units suitable for workplace learning using an AIGC model. The personalized recommendation engine, connected to both the dynamic learning system generation module and the multimodal content extraction and AIGC production module, combines individual employee tags to filter and push personalized learning content from the generated knowledge units.
[0015] This invention also discloses an AI-based personalized workplace learning method for enterprises. This method, based on the aforementioned system implementation, includes the following steps: S1: Constructing a digital profile of the enterprise that includes industry, business, and culture, specifically including:
[0016] S1.1: Multi-source heterogeneous data collection. External industry research reports, policy documents, competitor dynamics and market sentiment are collected through a distributed crawler system based on the Robots protocol. At the same time, the enterprise database is encrypted and connected through a JDBC connection pool to extract organizational structure, job descriptions, performance appraisal results, business process SOP documents, anonymized internal communication records and corporate culture manuals.
[0017] S1.2: Data cleaning and preprocessing. Rule-based data cleaning algorithms are used to perform range verification, missing value handling and outlier detection on numerical data. For text data, HTML tags and special symbols are removed using regular expressions. Chinese words are segmented using the jieba word segmenter combined with a custom enterprise terminology dictionary. Stop words are removed and stemming is performed on English terms.
[0018] S1.3: Multimodal semantic analysis, using a hybrid architecture design to encode the cleaned data:
[0019] S1.3.1: For long texts such as industry research reports and policy documents, use the BERT-large model, which has been adaptively pre-trained in the domains of Wikipedia Chinese corpus and financial and economic corpus, to perform paragraph-level semantic encoding and generate industry feature vectors;
[0020] S1.3.2: For short texts such as corporate culture manuals and value keywords, the Sentence-BERT model with a contrastive learning loss function is used to generate sentence-level semantic vectors and generate cultural value dimension vectors.
[0021] S1.3.3: For business process SOP documents, information extraction technology is used to identify process nodes and the temporal and logical relationships between nodes, a graph neural network model is constructed to represent the knowledge graph, and business capability graph vectors are generated.
[0022] S1.4: Cross-modal semantic alignment, constructing a trimodal contrastive learning framework, using the industry vector, culture vector, and business vector of the same enterprise as positive sample triples, randomly sampling the modal vectors of other enterprises as negative samples, and training through the InfoNCE contrastive loss function to map the three modal features to a unified semantic space;
[0023] S1.5: Hierarchical attention feature fusion introduces a multi-head self-attention mechanism to calculate the interaction weights of different feature subspaces, learns the importance weights of each feature domain through the SENet structure, and concatenates the weighted industry feature vector, business capability graph vector and cultural value dimension vector to form the final enterprise multi-dimensional digital profile vector and generate an interpretable tagged representation.
[0024] S2: Based on a large language model, it generates a customized workplace learning system according to enterprise profiles, specifically including:
[0025] S2.1: Prompt word engineering construction, which converts the enterprise's multi-dimensional digital profile into a structured prompt word template, adopts a three-part structure: the first part defines the task objectives, the middle part lists the key attributes of the enterprise profile in key-value pairs, and the last part clarifies the output format requirements;
[0026] S2.2: Guided Reasoning Through Thinking Chains: Design a four-step reasoning process to guide the large model to complete the task step by step.
[0027] S2.2.1: Identify typical business scenarios and core positions in the industry based on industry characteristics in the profile;
[0028] S2.2.2: Identify the current capabilities of an enterprise based on the business capability graph in the profile;
[0029] S2.2.3: Determine the value orientation that the learning content should emphasize based on the cultural value dimension in the portrait;
[0030] S2.2.4: Based on the above analysis, generate a competency model, a job learning path map, and key knowledge areas;
[0031] S2.3: Enhanced retrieval generation. Before inputting the prompt words into the large language model, the most relevant reference cases to the current enterprise profile are retrieved from the enterprise knowledge base through vector similarity retrieval, including the industry best practice library, historical training record library and performance improvement case library. After balancing relevance and diversity with a dual recall strategy, the cases are concatenated into the prompt words as contextual information.
[0032] S2.4: The learning system is output in a structured manner, using JSON-LD format to output a multi-level learning system. The top layer is the capability domain, the middle layer is the capability item, and the bottom layer is the capability indicator and learning objective. At the same time, a mapping relationship between job position, capability, and learning content is constructed, and a directed graph structure is used to store the job learning path graph. Nodes represent learning units, and edges represent pre- and post-relationships and dependencies. Key knowledge domains are extracted from the capability model, and metadata such as relevance to business scenarios, difficulty level, and expected learning time are labeled.
[0033] S2.5: Dynamic system evolution, based on two mechanisms: periodic triggering and event triggering, to realize the dynamic iteration of the learning system. Periodic triggering automatically reconstructs the system according to a preset cycle, while event triggering responds to real-time events of the enterprise business system to perform local incremental updates. A version control strategy is used to manage the system iteration history, and A / B testing and system rollback are supported.
[0034] S3: Based on RAG and AIGC technologies, it extracts and generates high-concentration workplace knowledge units from its own copyrighted library, specifically including:
[0035] S3.1: Construct a vectorized index for copyrighted content and adopt a differentiated processing strategy for content of different modalities;
[0036] S3.1.1: For e-book content, a hybrid segmentation algorithm is adopted with chapters as boundaries and paragraphs as basic units, controlling the number of words in the text block within the range of 512 words ± 10% (i.e., 460-560 words), retaining 10% overlap between adjacent windows, and generating high-dimensional vectors through an embedding model;
[0037] S3.1.2: For audio and video content, the speech is converted into text using the end-to-end Transformer architecture ASR technology. The scene detection algorithm is used to fuse the visual camera boundary and the semantic topic boundary for segmentation, generating fragment text description vectors.
[0038] S3.1.3: For chart-type content, the CLIP multimodal large model is used to extract the joint representation of visual features and text descriptions, and the visual feature vector and text feature vector are weighted and fused to form the chart joint representation vector;
[0039] S3.1.4: Store all content fragment vectors in the Milvus vector database, establish the HNSW index structure, and simultaneously build a keyword inverted index based on Elasticsearch;
[0040] S3.2: RAG retrieval and recall. After receiving a content production request, the requested content is converted into a query vector through the same embedding model. Approximate nearest neighbor retrieval is performed in the vector database. An MMR diversity penalty factor is introduced to balance relevance and diversity, and the top K content fragments that are most relevant to the request semantics are recalled.
[0041] S3.3: Multi-agent collaborative content restructuring and rewriting, employing a multi-agent collaborative architecture to process the original fragments:
[0042] S3.3.1: The first agent analyzes the logical structure and core arguments of the original fragment, extracts the chapter structure, and outputs the summary skeleton;
[0043] S3.3.2: The second intelligent agent extracts professional terms through a terminology recognition model, supplements explanatory content from an internal knowledge base or external knowledge sources, and generates terminology explanations;
[0044] S3.3.3: The third intelligent agent adapts structured content to the specified workplace learning format, including micro-lesson scripts, case studies, script libraries, mind maps, etc., and organizes and expresses content in a differentiated manner according to different learning formats;
[0045] S3.4: Knowledge traceability and copyright protection. During the content generation process, metadata such as document ID, paragraph position, and copyright information of the original fragments are retained and embedded in the file attributes of the generated content. For video content, frequency domain watermarking technology based on DCT transform is used, and for text content, semantic watermarking technology based on synonym replacement is used to achieve copyright tracking.
[0046] S3.5: Multi-dimensional quality assessment, automating the evaluation of generated knowledge units:
[0047] S3.5.1: Content fidelity assessment, which calculates the vector similarity between the generated content and the original fragment based on bidirectional semantic similarity alignment, and verifies the integrity of key entities;
[0048] S3.5.2: Readability assessment, using the Flesch-Kincaid scale to calculate text difficulty and monitor term density;
[0049] S3.5.3: Scene matching degree evaluation, calculate the cosine similarity between the generated content vector and the target business scene vector;
[0050] Knowledge units that pass the evaluation are stored in the knowledge unit library; those that fail are triggered to be regenerated or manually reviewed.
[0051] S4: Combining employee tags, a recommendation model is used to personalizedly push knowledge units to employee terminals, specifically including:
[0052] S4.1: Multi-source user feature construction, integrating static features (job level, department, years of service, historical learning completion rate, certification qualifications), dynamic behavioral features (vectorized representation of search keywords in the past 7 days, LSTM encoding results of clickstream sequence, distribution of video viewing time, and exit position of incomplete knowledge units), and enterprise context features (current enterprise strategic direction vector, semantic embedding of recent business pain points, and organizational capability weakness labels).
[0053] S4.2: Time decay behavior modeling, assigning exponentially decaying weights to users' historical interaction behaviors. , where Δt is the time interval between the occurrence of the behavior and the present, and λ is the decay coefficient, which is dynamically adjusted according to the characteristics of the learning domain (a larger λ is set for domains with fast skill updates, and a smaller λ is set for domains with basic knowledge).
[0054] S4.3: Dual-tower model encoding and dynamic attention feature cross-encoding: An improved DSSM dual-tower structure is used to encode user features and knowledge unit features separately. During the encoding process, a SENet dynamic attention feature cross-encoding layer is introduced.
[0055] S4.3.1: The Squeeze operation compresses the embeddings of each feature domain into a scalar using global average pooling;
[0056] S4.3.2: Excitation operation, which learns the importance weights of each feature domain through two fully connected layers;
[0057] S4.3.3: Reweight operation, which multiplies the learned weights with the original features to complete feature recalibration;
[0058] The weighted user feature vector and the knowledge unit feature vector are used to calculate the cosine similarity to generate a matching score.
[0059] S4.4: Cold start processing, for new employees or users with sparse historical behavior, adopts a knowledge graph-based embedding method:
[0060] S4.4.1: Construct a job knowledge graph, where nodes include entities such as job title, department, skills, and certifications, and edges represent the relationships between entities;
[0061] S4.4.2: Map the attribute features of new employees to the knowledge graph space and generate embedding vectors through graph neural networks;
[0062] S4.4.3: Find the group of mature employees most similar to the employee's attributes, and generate an initial recommendation list based on group preferences;
[0063] S4.4.4: Reorder the initial list based on the company's current strategic priorities;
[0064] S4.5: Two-stage screening and refined sorting:
[0065] S4.5.1: In the first stage, the dual-tower model is used to quickly recall the top hundreds of candidate knowledge units most relevant to the user from the knowledge unit base;
[0066] S4.5.2: In the second stage, ranking models such as DeepFM are used to refine the ranking of the candidate set, and cross features such as the matching degree between user positions and knowledge unit difficulty levels, and the matching degree between user historical learning time distribution and knowledge unit duration are introduced.
[0067] S4.6: Multi-terminal adaptation and real-time scenario triggering, supporting push notifications from multiple terminals such as PC, mobile, and embedded in WeChat Work, and monitoring the user's business system operation status through APIs to trigger instant push notifications based on real-time scenarios;
[0068] S4.7: Reinforcement learning dynamic feedback optimization continuously tracks user feedback behavior on pushed content (clicks, completions, favorites, shares, and subsequent performance changes), and dynamically adjusts the recommendation strategy through reinforcement learning algorithms to form a closed loop of "recommendation-feedback-optimization".
[0069] Beneficial effects
[0070] (1) Realize the transformation of the learning system from "one-size-fits-all" to "one-size-fits-all", so that the training content is deeply aligned with the enterprise's strategic direction, core business pain points and cultural orientation. The system evolves dynamically with the development of the enterprise, fundamentally solving the technical problem of the disconnect between training and business.
[0071] (2) Construct a deep coupling mechanism between authoritative knowledge assets and workplace application scenarios, compress the traditional manual course development cycle from the weekly level to the hour level, and achieve large-scale and efficient knowledge unit production while ensuring the authority of the content through multimodal preprocessing, multi-agent collaborative generation and knowledge traceability technology.
[0072] (3) Breaking through the limitations of traditional recommendation systems that only focus on personal interests, the system integrates the corporate strategic context into recommendation decisions and adaptively adjusts the weight distribution of job features, behavioral features and strategic features through dynamic attention feature cross-layer to achieve synergistic optimization of personal development needs and organizational strategic orientation;
[0073] (4) Establish a data closed loop from profile building, system generation, content production to precise recommendation, continuously optimize the parameters of each module through user behavior feedback, form a self-evolving learning ecosystem, and ensure that the talent training of the enterprise always resonates with the business development. Attached Figure Description
[0074] Figure 1 This is a schematic diagram of the system structure of the present invention. Detailed Implementation
[0075] Example 1
[0076] The following is a detailed description of the four core functional modules of this invention in specific embodiments, covering the composition, internal relationships, functions, and working principles of each module. The description fully demonstrates the reasons for the selection of technical means and their effects, and incorporates relevant technical details from search results to enhance the feasibility and inventiveness of the solution.
[0077] In one specific implementation, the AI-based personalized workplace learning system of the present invention is deployed in an enterprise private cloud or hybrid cloud environment, and establishes connections with internal enterprise systems (such as HRIS, CRM, ERP) and external data sources through secure API interfaces. The system adopts a microservice architecture, with four core functional modules that are both independent and form an organic whole through standardized data interfaces: the multi-dimensional digital profile output by the enterprise profile building module serves as input to the dynamic learning system generation module; the capability model and learning path graph produced by the dynamic learning system generation module drive the content-oriented production of multimodal content extraction and AIGC production modules; and the personalized recommendation engine integrates the output results of the above modules and combines them with real-time employee behavior data to complete accurate push notifications. A closed-loop data flow is formed among the four modules: "profile drives system, system guides production, production supports recommendation, and recommendation feedback optimizes profile," ensuring that the system has the ability to continuously evolve.
[0078] In one specific implementation, the enterprise profiling module transforms the multidimensional characteristics of a complex organization—the enterprise—into a machine-understandable digital representation through the collection, cleaning, semantic analysis, and feature fusion of multi-source heterogeneous data. This lays the data foundation for the personalized generation of subsequent learning systems. The module's workflow involves a complete technical chain from raw data access to final profiling vector output, with each step's technical means designed and optimized to meet the specific needs of enterprise workplace learning scenarios.
[0079] At the data acquisition level, the module needs to process both publicly available data from outside the enterprise and private data from within, and the acquisition methods for these two types of data are fundamentally different. External data mainly includes industry research reports, policy documents, competitor activities, and market sentiment. This data is scattered across various websites, industry databases, and news portals, requiring targeted crawling using web crawler technology. A distributed crawler architecture is adopted based on both scale and efficiency considerations: when the scope of industries to be monitored expands or the crawling frequency increases, a single-machine crawler can easily become a performance bottleneck, and a failure can cause the entire collection task to be interrupted. Specifically, the module builds a distributed crawler cluster based on the Scrapy framework, separating the download service from the page parsing logic. The crawler master node is responsible for task scheduling, distributing the URLs to be crawled to multiple worker nodes; each worker node independently completes page download and preliminary parsing, and returns the results to the central storage. With this architecture, when the data volume in a certain industry surges, near-linear performance scaling can be achieved by adding worker nodes. Regarding compliance, the crawler system strictly adheres to the Robots Exclusion Protocol. Before initiating a request, it first checks the target website's robots.txt file, parsing the allowed and prohibited crawling paths declared therein. For explicitly prohibited directories or pages, the system automatically skips them; for allowed pages, it sets a reasonable User-Agent identifier in the request header and adds a random delay between requests to avoid excessive pressure on the target server. This strategy respects the website owner's rights and is a prerequisite for ensuring long-term stable data acquisition; otherwise, if the IP address is blocked, the entire data source will be interrupted.
[0080] For internal enterprise data, including organizational structure, job descriptions, performance appraisal results, business process SOP documents, internal communication records (strictly anonymized), and corporate culture manuals, the collection method is completely different. The module establishes an encrypted connection to the enterprise database via a JDBC connection pool. The connection pool technology reuses database connections, avoiding the complete connection establishment and release process for each query, significantly improving response speed during high-concurrency queries. SSL encryption is configured in the connection parameters to ensure data transmission security. Unstructured documents stored on a file server (such as Word versions of SOPs and PDF versions of corporate culture manuals) are retrieved in batches via the SFTP protocol. It is important to note that all internal communication records involving employee personal information are anonymized before collection, removing direct identifiers such as names and employee numbers, retaining only the semantic information of the communication content itself. This complies with data privacy regulations and eliminates privacy concerns in subsequent analysis.
[0081] The quality of the raw data collected varies greatly and must be cleaned and preprocessed before it can be used for subsequent analysis. In this step, a rule-based data cleaning algorithm is adopted. The so-called "rules" are not single rules, but a set of rule combinations constructed according to data types and noise characteristics. For numerical data, the rules include range verification (e.g., the performance appraisal score must be between 0 and 100), missing value handling (e.g., filling with the mean or mode), and outlier detection (e.g., identifying and removing data points beyond three standard deviations through the 3σ principle). For text data, the noise mainly manifests as HTML tags, advertising codes, special symbols, duplicate content, etc. The cleaning process first matches and removes HTML tags through regular expressions, and then filters out special symbols irrelevant to the text content according to the preset symbol blacklist. After the basic cleaning is completed, it enters the Chinese word segmentation step. The jieba word segmenter is used and a custom enterprise term dictionary is loaded - this dictionary automatically extracts high-frequency professional vocabulary by analyzing the enterprise's historical documents and is stored in the database after manual review to ensure that industry-specific terms such as "digital transformation" and "supply chain finance" can be correctly segmented. The word sequence after word segmentation further performs stop word removal. In addition to the general function words such as "de", "le", and "zai", the stop word list is dynamically expanded according to industry characteristics. For example, in financial industry research reports, high-frequency but meaningless words such as "report" and "data" are also added to the stop word list. For English text or English terms in Chinese text, stemming (e.g., "learning" is extracted as "learn") is performed to unify the word form. The design logic of this set of rule combinations is that different types of data require different processing strategies. The general cleaning process cannot cope with the diversity of enterprise data. Only through a configurable rule engine can different cleaning parameters be customized for different sources such as industry research reports, internal documents, and communication records, so as to remove noise to the greatest extent while retaining valid information.
[0082] The cleaned text data is input into a multimodal semantic analysis engine. The engine employs a hybrid architecture because enterprise data is diverse in form and semantic granularity, making it difficult for a single model to achieve optimal performance across all types. For long texts such as industry reports and policy documents, characterized by length, complete logical structure, and dense terminology, it's necessary to capture the macro-level semantics between paragraphs and the deep connections between terms. The module uses a pre-trained BERT-large model for paragraph-level semantic encoding. This model, based on the Transformer architecture, can model long-distance dependencies through a multi-layered bidirectional attention mechanism. More importantly, the model undergoes further domain-adaptive pre-training on Wikipedia Chinese corpus and financial and economic corpus, building upon general Chinese corpus data. This step is crucial because the general BERT model, during pre-training, deals with everyday text and has limited semantic representation capabilities for professional terms such as "ROE" and "non-performing loan rate." Further pre-training on domain-specific corpora shrinks the model's word embedding space towards the professional domain, bringing similar professional terms closer together in the vector space, thereby improving the accuracy of downstream tasks.
[0083] For short texts such as corporate culture manuals and value keywords, the characteristics are short length and concise semantics, focusing more on the overall meaning at the sentence level rather than deep dependencies between words. If BERT-large is still used for encoding, on the one hand, the model capacity will be wasted due to the short input length, and on the other hand, it is difficult to obtain discriminative sentence vectors—the classic use of BERT is to take the output at the [CLS] position as the sentence vector, but research shows that this method is not ideal. Therefore, this module adopts the Sentence-BERT model to generate sentence-level semantic vectors. The core idea of Sentence-BERT is to add a Siamese network structure on top of BERT. During training, the contrastive learning loss function makes semantically similar sentence pairs closer in the vector space and semantically dissimilar sentence pairs farther apart. The reason for introducing contrastive learning loss is that the training goal of traditional language models is to predict the masked word or the next sentence, and does not directly optimize the similarity relationship between sentences; while corporate profiling requires a lot of comparison of semantic similarity between different documents (such as judging whether two culture manuals advocate similar values), contrastive learning can explicitly guide the model to learn this similarity measure. During training, positive sample pairs are composed of semantically similar sentences labeled by humans (such as "customer first" and "customer-centric"), while negative sample pairs are randomly sampled. The model adjusts its parameters by minimizing triplet loss or cross-entropy loss, ultimately resulting in a well-structured semantic vector space after encoding.
[0084] For business process Standard Operating Procedures (SOPs), the core information includes not only the textual descriptions of the steps but also the temporal and logical relationships between them. If treated merely as plain text, this structural information will be completely lost. Therefore, this module introduces a graph neural network model to construct a knowledge graph representation of the business process. Specifically, information extraction techniques are first used to identify process nodes (such as "receive order," "check inventory," and "confirm payment") and the transition relationships between nodes from the SOP text. These nodes serve as vertices of the graph, while the sequential relationships ("receive order" leads to "check inventory") and branching relationships (if inventory is sufficient, proceed to "confirm payment," otherwise proceed to "purchase and replenish") between nodes serve as edges. The textual description of each node is encoded into an initial feature vector using the Sentence-BBERT model described above. The graph neural network (such as a graph convolutional network GCN or a graph attention network GAT) iteratively updates the node representation by aggregating the features of neighboring nodes, and finally, a pooling operation is used to obtain the graph-level vector of the entire business process. The core value of using graph neural networks lies in their ability to preserve the structural topological information of a process in vector space. Two processes that seem different in text will still be close in vector space after being encoded by GNN if their step order and decision logic are similar. This is something that pure text models cannot achieve.
[0085] At this point, the module has generated three different modalities of enterprise features: industry feature vectors (from research reports encoded by BERT), cultural value dimension vectors (from cultural manuals encoded by Sentence-BERT), and business capability graph vectors (from business processes encoded by GNN). However, these feature vectors reside in their respective semantic spaces, and directly concatenating them will lead to the "apple plus orange" problem—the numerical distributions and dimensional meanings of features from different modalities are incomparable, making effective joint inference impossible. To solve this semantic alignment problem, the module includes a cross-modal alignment layer, which uses a contrastive learning mechanism to map the features of the three modalities to a unified semantic space.
[0086] The core idea of cross-modal alignment is that for the same company, its industry characteristics, cultural characteristics, and business characteristics describe different aspects of the same entity, and therefore should be close to each other in a unified semantic space. However, for different companies, even if they are similar in the same modality (e.g., belonging to the same manufacturing industry), if their cultures and businesses differ significantly, they should remain distant in the unified space. In specific implementation, the module constructs a three-modal contrastive learning framework: taking companies as units, the industry vector, cultural vector, and business vector of the same company are used as positive sample triples; the modal vectors of other companies are randomly sampled as negative samples. Training is performed using a contrastive loss function (such as InfoNCE), encouraging the model to bring positive samples closer together in the embedding space and push negative samples further apart. This process can be formally represented as:
[0087]
[0088] Where: L: the contrast loss function value, used to measure the relative distance between positive and negative sample pairs in the current batch, and the training objective is to minimize this loss;
[0089] log: the natural logarithm function, which transforms the product form into a summation, making gradient calculation easier;
[0090] exp(·): Exponential function, which converts similarity into non-negative weights and amplifies differences;
[0091] sim(·,·): Cosine similarity function, defined as , is used to measure the cosine of the angle between two vectors, and its value ranges from [-1,1] to [-1,1]. The larger the value, the more similar the vectors are.
[0092] v ind : Industry feature vector, obtained by encoding long texts of industry research reports by a multimodal semantic analysis engine, with dimension d;
[0093] v cul The cultural values dimension vector is obtained by encoding short texts such as corporate culture manuals using the Sentence-BERT model, and its dimension is d.
[0094] v bus : Business capability graph vector, obtained by encoding the business process SOP document by a graph neural network, with dimension d;
[0095] τ: Temperature parameter, a positive real number hyperparameter used to control the smoothness of the similarity distribution. The smaller τ is, the sharper the output of the exponential function, and the more the model focuses on the most similar positive and negative samples; the larger τ is, the smoother the distribution, and the more evenly it focuses on all samples.
[0096] N: The set of negative samples, containing modality vectors from other companies randomly sampled from the current batch. Specifically, for v... ind For the calculation of anchor points, N includes the industry vector, culture vector, and business vector of other companies, that is, each negative sample v j Possibly different modalities from different companies;
[0097] v j The j-th vector in the negative sample set N represents a certain modal feature of other companies;
[0098] The design principle of this loss function is as follows: the numerator calculates the sum of the similarity indices between each pair of the three modal vectors of the same enterprise, which serves as the cumulative contribution of positive sample pairs; the denominator calculates the anchor point v. ind The sum of similarity indices with all negative samples is used as the cumulative background of negative samples. By minimizing the negative log-likelihood, the model is guided to increase the numerator (i.e., bring vectors of different modalities within the same firm closer together) while decreasing the denominator (i.e., push vectors of different firms further apart, especially those with v). ind (For reference). This mechanism effectively achieves cross-modal semantic alignment, making the modal features in the final multi-dimensional digital profile of an enterprise comparable and additive in a unified space.
[0099] Through comparative training with a large amount of enterprise data, the encoders of the three modalities will adjust in coordination, so that the cross-modal representations of the same enterprise will naturally converge in the vector space, thus achieving semantic unification.
[0100] During the feature fusion stage, the module employs a hierarchical attention mechanism to dynamically weight and fuse the aligned multi-source features. The reason for needing an attention mechanism instead of simple averaging or concatenation is that the industry attributes, business models, and cultural characteristics of different companies have significantly different weights in influencing the subsequent learning system construction. For example, for a technology startup, the weights of R&D innovation capabilities and innovation-oriented cultural features should be much higher than those of manufacturing features; while for a traditional manufacturing company, process flow features and quality control features are more critical. Static weight allocation cannot adapt to such dynamic changes, while the attention mechanism can adaptively calculate the contribution of each feature based on the actual content of the input data.
[0101] In practical implementation, the industry feature vector, business capability graph vector, and cultural value dimension vector are first used as inputs to three feature subspaces. A multi-head self-attention mechanism is introduced to calculate the interaction weights between different feature subspaces. This step allows each feature to refer to information from other features when updating its own representation—for example, when the cultural vector is finally expressed, it can perceive that the current company's industry attribute is "finance," thereby adjusting the emphasis on dimensions such as "risk awareness" and "compliance culture." The multi-head mechanism allows the model to capture the interaction patterns between features from multiple perspectives.
[0102] Subsequently, importance weights for each feature domain are learned using the SENet (Squeeze-and-Excitation Networks) architecture. SENet first compresses each feature vector into a scalar using global average pooling (Squeeze operation), then learns the activation weights for each feature channel through two fully connected layers (Excitation operation), and finally weights are summed back to the original features. This process can be expressed by the following formula:
[0103]
[0104] Where z is the compressed feature descriptor, W1 and W2 are learnable parameters, σ is the sigmoid activation function, and the output s is the weight coefficient of each feature domain. The features are weighted. The core value of SENet lies in its ability to learn global information to determine which feature domains to "focus on" and which feature domains to "suppress." This dynamic recalibration mechanism enables the final profile representation to adaptively adjust the distribution of feature importance according to the actual situation of the enterprise.
[0105] The weighted industry feature vector, business capability graph vector, and cultural value dimension vector are concatenated to form the final multi-dimensional digital profile vector of the enterprise. This vector is also stored in a vector database for subsequent modules to perform similarity retrieval and feature input, and comes with interpretable tagged representations, such as "Industry = High-end equipment manufacturing", "Development stage = Rapid growth period", "Culture orientation = Results-oriented", "Core capability weakness = Digital marketing", etc. The generation of tagged representations is achieved by matching the profile vector with predefined tag prototype vectors based on similarity, which facilitates understanding and intervention by business personnel.
[0106] At this point, the enterprise profiling module has completed the full transformation from raw multi-source data to structured digital profiles. The design logic of the entire module reflects a deep adaptation to enterprise workplace learning scenarios: distributed web crawlers ensure the scale and compliance of external knowledge collection; a multi-model hybrid architecture adapts to the semantic characteristics of different text types; graph neural networks preserve process structure information; cross-modal alignment solves the comparability problem of heterogeneous features; and a hierarchical attention mechanism achieves dynamic adaptive weighting. The combination of these technologies is not a simple patchwork, but a systematic solution built to address the technical problem of "how to accurately, completely, and computably express the multidimensional characteristics of a complex organization like an enterprise," providing high-quality data input for the personalized generation of subsequent learning systems.
[0107] In one specific implementation, the dynamic learning system generation module takes the multi-dimensional digital profile output by the enterprise profile construction module as its core input, combines it with a pre-set general workplace competency framework, and generates a personalized enterprise learning system through the reasoning capabilities of a large language model. The module's operation involves a complete technical chain from input parsing to system output. Its core lies in how to transform the abstract enterprise profile into a structured, executable learning path, which requires solving a series of technical problems such as prompt word engineering, external knowledge injection, output format standardization, and dynamic evolution of the system.
[0108] At the input processing level, the module first receives a multi-dimensional digital profile of the enterprise from the enterprise profile building module. This profile is stored in a vector database in vector form and includes interpretable tagged representations. However, large language models cannot directly understand high-dimensional vectors, so the profile needs to be converted into natural language prompts that the model can process. This conversion process is not a simple tag concatenation, but requires the construction of a structured prompt template to organize key information in the profile, such as industry characteristics, business capability maps, cultural value dimensions, and capability shortcomings, into a hierarchical semantic representation. Specifically, the prompt template adopts a three-part structure: the first part defines the task objective, namely, "to generate a personalized learning system based on the following enterprise characteristics"; the middle part lists the key attributes of the enterprise profile in key-value pairs, including industry category, development stage, core business pain points, cultural orientation, and key capability shortcomings; the last part clarifies the output format requirements, specifying that the generated content should include three parts: a capability model, a job learning path map, and key knowledge domains, and specifies the organization method of each part.
[0109] The prompts employ Chain-of-Thought technology to guide large models through step-by-step reasoning tasks. The core idea of Chain-of-Thought is to demonstrate intermediate reasoning steps within the prompts, enabling the model to mimic the step-by-step human thought process and thus improve the accuracy of complex reasoning tasks. Research shows that Chain-of-Thought prompting significantly enhances the performance of large models on arithmetic, common sense, and symbolic reasoning tasks. Its mechanism may lie in guiding output generation through answer templates and constraining the decoding space. In this module, Chain-of-Thought is designed as a four-step reasoning process: First, based on industry characteristics in the profile, identify typical business scenarios and core positions within the industry. This step requires the model to utilize its pre-trained knowledge of common sense related to the specific industry. For example, in the retail industry, typical business scenarios include store operations, supply chain management, and customer service, while core positions include store managers, purchasing specialists, and sales associates. Second, based on the business capability graph in the profile, identify the current capabilities of the enterprise. The business capability graph represents the capability requirements and actual performance of each node in the enterprise's core business processes in vector form. By comparing the graph vectors with preset capability benchmark vectors, capability gaps can be quantitatively identified. The third step is to determine the value orientation that the learning content should emphasize based on the cultural value dimension in the profile. The corporate culture value dimension vector is transformed into interpretable value labels, such as "results-oriented," "innovation-driven," and "customer-centric," through similarity matching with predefined value prototype vectors. These labels will influence the style bias and case selection of the learning content. The fourth step is to synthesize the above analysis to generate a capability model, a job learning path map, and key knowledge areas.
[0110] In actual operation, the module employs a Retrieval Augmentation (RAG) architecture to enhance the professionalism and accuracy of the generated content. RAG combines an external knowledge base with a language model, overcoming the limitations of knowledge deadlines and domain coverage imposed by model parameterization. Specifically, before inputting prompts into the large language model, the module first retrieves the most relevant reference cases to the current enterprise profile from the enterprise knowledge base using vector similarity retrieval. The enterprise knowledge base is a continuously accumulating collection of knowledge assets, comprising three main parts: an industry best practices library, containing capability models, training systems, and performance improvement cases from benchmark companies across various industries; a historical training record library, storing past training plans, course feedback, and effectiveness evaluation data; and a performance improvement case library, recording successful cases of solving specific business problems through training. These knowledge assets are all vectorized and stored in a vector database shared with the profile building module.
[0111] The retrieval process employs a dual recall strategy to balance relevance and diversity. First, a global similarity search is performed based on the enterprise profile vector, recalling the top K cases most similar to the current enterprise in terms of industry attributes, development stage, and capability shortcomings. Then, a targeted search is conducted based on specific dimensions in the profile (such as cultural values) to ensure that the recalled cases match the target enterprise in terms of cultural orientation. After reordering, the most relevant N cases are selected as contextual information and incorporated into the prompts. This design is based on the consideration that relying solely on the parametric knowledge of a large language model may lead to overly generalized generated content or deviations from the actual situation of the enterprise. Introducing enterprise-specific case retrieval effectively anchors the generation direction, ensuring that the learning system both follows general industry rules and fits the specific enterprise scenario. For example, for a retail enterprise facing digital transformation, the system may retrieve the digital capability models of benchmark enterprises in the same industry as a reference, and then combine this with specific indicators such as the enterprise's current store coverage and online penetration rate to generate a customized capability model including characteristic capability dimensions such as "omnichannel operation" and "customer data insights."
[0112] The retrieval-enhanced generation architecture faces two key engineering challenges in practical deployment: real-time knowledge base updates and the structural quality of content. To address the real-time knowledge base update issue, the module establishes an event-triggered automated synchronization pipeline, deeply integrated with the enterprise's internal knowledge management system (such as Confluence, GitLab, and Enterprise Wiki). When source knowledge documents undergo addition, deletion, or modification operations, the system captures the change event in real time via Webhook and automatically updates the index in the vector database within 30 seconds, ensuring that search results are always based on the latest data. Regarding the content structural quality issue, the module performs a unified preprocessing standard on knowledge documents before they are added to the database, converting PDF, Word, and other formats to Markdown format. Through intelligent parsing, it identifies document heading levels, paragraph boundaries, table structures, and code blocks, ensuring the semantic integrity of information block division. Compared to simple text segmentation methods based on character count, this structured processing improves the accuracy of key information identification by more than 40%, effectively avoiding semantic loss caused by information truncation.
[0113] The generated learning system is output in JSON-LD format. JSON-LD (JavaScript Object Notation for Linked Data) is a lightweight linked data format based on JSON, capable of expressing entities and their relationships in a machine-readable manner. The core advantages of using the JSON-LD format are: firstly, it combines the simplicity of JSON with the semantic expressiveness of linked data, clearly defining hierarchical relationships and mappings in the capability model; secondly, it is widely supported by mainstream search engines and AI systems, facilitating interoperability and reuse of the learning system across different platforms; and thirdly, it supports referencing external vocabularies (such as schema.org) via the @context field, enabling seamless integration of the output learning system with other knowledge graph systems.
[0114] The JSON-LD structure of the learning system contains a multi-level organization. The top level is the competency domain, identified by `@type: "CompetencyDomain"`, containing attributes such as `domainName` (domain name, e.g., "Leadership", "Professional Skills", "General Skills"), `domainDescription` (domain description), and `hasCompetency` (a reference to a middle-level competency item). The middle level consists of competencies, identified by `@type: "Competency"`, containing attributes such as `competencyName` (competency name, e.g., "Team Motivation", "Data Analysis", "Communication"), `competencyDescription` (competency description), `competencyLevel` (competency level, e.g., beginner / intermediate / advanced), and `hasIndicator` (a reference to a lower-level competency indicator). The bottom level consists of competency indicators, identified by `@type: "CompetencyIndicator"`, containing attributes such as `indicatorName` (indicator name, e.g., "Able to analyze team dynamics using at least three motivational theories"), `learningObjective` (the corresponding learning objective), and `assessmentMethod` (assessment method). This multi-layered nested structure clearly expresses the complete spectrum from macro-level capabilities to micro-level measurable indicators.
[0115] The job learning path graph constructs a mapping relationship between job positions, skills, and learning content, stored in a graph database using a directed graph structure. Nodes in the graph database represent learning units, each containing attributes such as unit ID, unit type (micro-course / case study / simulation exercise, etc.), associated skills, and estimated learning duration. Edges represent prerequisite and dependency relationships between learning units, with edge attributes including relationship type (e.g., "prerequisite," "progressive," "parallel optional") and relationship strength weight. The advantages of using a graph structure to store job learning paths are: firstly, graph models are naturally suitable for expressing complex dependency networks, intuitively presenting branches and convergences in the learning path; secondly, graph query languages (such as Cypher) support efficient path retrieval and reachability analysis, facilitating subsequent recommendation engines to dynamically calculate the optimal next step based on the employee's current learning progress; and thirdly, graph structures facilitate subsequent expansion and reconstruction. When corporate strategy adjustments require the insertion of new learning units, only nodes need to be added and relevant edge relationships updated, without reconstructing the entire data model.
[0116] The key knowledge domain is a set of knowledge points extracted from the competency model, stored in a graph database in the form of a knowledge graph and linked to the job learning path map. Each knowledge point node includes attributes: knowledge point ID, knowledge point name, knowledge point description, associated competency item ID, relevance to the business scenario (obtained by calculating the cosine similarity between the knowledge point vector and the business scenario vector), difficulty coefficient (calculated based on the density of professional terms and logical complexity in the knowledge point), estimated learning time (derived based on historical learning data statistics), and associated copyrighted content source (pointing to the original content fragment in the multimodal content extraction module). The construction process of the key knowledge domain involves mapping from competency indicators to specific knowledge points: each competency indicator corresponds to one or more knowledge points, and knowledge points are interconnected through relationship types such as "prerequisite knowledge," "extended knowledge," and "applied knowledge," forming a complete knowledge network covering the competency model. This networked organization enables the subsequent personalized recommendation engine to make inferential recommendations based on the relationships between knowledge points. For example, "Since the user has mastered knowledge point A, and knowledge point B is prerequisite knowledge of A, there is no need to recommend B; knowledge point C has a strong relationship with A and can be recommended as extended content."
[0117] The module also incorporates a system evolution mechanism to enable dynamic iteration of the learning system. This mechanism is based on two triggering conditions: periodic triggering and event triggering. Periodic triggering automatically executes the system reconstruction process according to a preset time cycle (e.g., quarterly), comparing the latest enterprise profile output by the enterprise profile building module with historical profiles, identifying significant changes in profile dimensions, and triggering the regeneration of the learning system. Event triggering responds to real-time events in the enterprise's business systems, such as new product line launches, organizational restructuring, and major strategic announcements. When these events are detected, the system automatically extracts relevant business data, updates the corresponding dimensions in the enterprise profile, and immediately triggers incremental updates to the local system, rather than a global reconstruction. This incremental update strategy is based on the consideration that frequent full reconstructions consume significant computing resources and may lead to a decrease in the stability of the learning system; while event-triggered local updates can quickly respond to business changes while maintaining the stability of the main system, ensuring that the learning content in key areas remains synchronized with the latest strategies.
[0118] In its implementation, the system evolution mechanism employs a version control strategy to manage the iteration history of the learning system. Each generated system version saves a complete snapshot and records metadata such as version number, effective date, associated enterprise profile version, and change logs. When a new version is generated, the system supports A / B testing: selecting a subset of employees or departments as the experimental group to apply the new system, while the control group continues to use the old system. By comparing the learning outcomes and business performance indicators of the two groups, the optimization effect of the new system is evaluated. The validated new version is gradually expanded to a wider range of applications, eventually achieving a full rollout. Simultaneously, the version control mechanism supports system rollback functionality. When a significant defect is discovered in the new system, a rapid switch back to a stable version can be achieved, ensuring business continuity.
[0119] This dynamic generation mechanism, compared to traditional static learning systems, has the advantage of transmitting changes in corporate strategy to the training end in real time, ensuring that talent development always resonates with business development. Traditional learning systems, once established, tend to remain static. When corporate business direction changes, new products are launched, or market strategies shift, training content struggles to respond promptly, resulting in "training following the business" rather than "training leading business development." Through dynamic updates to the corporate profile and the automatic evolution of the learning system, this module achieves synchronized evolution between the learning system and corporate strategy, fundamentally solving the technical challenge of training being disconnected from business operations.
[0120] Multimodal content extraction and AIGC production module
[0121] In one specific implementation, the multimodal content extraction and AIGC production module undertakes the core function of transforming high-quality knowledge resources from CITIC Publishing's proprietary copyrighted content library into lightweight knowledge units suitable for workplace learning. This is a crucial link in realizing knowledge assetization and content reproduction. The module's workflow involves a complete technical chain, from preprocessing the original copyrighted content and constructing vectorized indexes, to precise retrieval based on RAG, multi-agent collaborative content reorganization and rewriting, and finally to knowledge tracing and quality assessment. The technical means at each stage are systematically designed to meet the efficiency, quality, and compliance requirements of content production in workplace learning scenarios.
[0122] During the preprocessing and vectorized indexing phase of copyrighted content, the module needs to handle raw knowledge assets of various modalities, including e-books, audio and video, charts, etc. These different modalities of content have very different structural features and semantic expressions, so differentiated processing strategies are required.
[0123] For ebook content, the module first performs format parsing and text extraction, converting ebooks in formats such as PDF and EPUB into plain text while preserving structured information such as chapter titles, paragraph boundaries, headers, and footers. Based on this, the module employs a paragraph-level segmentation strategy to divide long texts into semantically complete segments. The design of this segmentation strategy requires balancing three key factors: natural segmentation of chapter boundaries, computational efficiency of paragraph length, and preservation of semantic integrity. While simple segmentation by fixed character count is straightforward, it easily severs semantically coherent sentences or paragraphs, resulting in fragments retrieved later lacking complete contextual information. While simple segmentation by paragraph maintains semantic integrity within paragraphs, excessively long paragraphs may exceed the maximum input length limit of the embedding model, and it is difficult to accurately locate fine-grained knowledge points within paragraphs during retrieval. To address this, the module employs a hybrid segmentation algorithm: First, segmentation boundaries are defined by chapter to ensure that content belonging to the same chapter is not segmented across chapters. Then, within each chapter, consecutive paragraphs are combined into text blocks, with each block containing approximately 512 words, and segmentation is performed at semantically complete locations (such as the end of a paragraph). For particularly long paragraphs, a sliding window overlapping segmentation strategy is used, maintaining 10% overlap between adjacent windows to prevent key information from being truncated just outside the window boundaries. After segmentation, each text block generates a high-dimensional vector using an embedding model (such as text-embedding-3-large) and stores it in the Milvus vector database. Simultaneously, an inverted index is built, establishing a word-document ID mapping for keywords within the text blocks to support subsequent keyword retrieval.
[0124] For audio and video content, the module first uses Automatic Speech Recognition (ASR) technology to convert speech into text. The ASR model adopts an end-to-end Transformer architecture and has been pre-trained on a large amount of Chinese speech data, enabling it to adapt to speech input with different accents and speaking speeds. Building upon speech-to-text conversion, the module also needs to identify semantic boundaries in the video, segmenting long videos into topic-focused segments. This task is achieved using a scene detection algorithm: the algorithm detects shot transitions by analyzing visual feature differences between adjacent frames (such as color histograms and edge distribution); simultaneously, it analyzes topic shifts in the speech-text stream, identifying topic boundaries by calculating the semantic similarity of adjacent text windows. The visual shot boundaries and semantic topic boundaries are then fused and aligned to ultimately determine the segmentation points for the video segments. Each segmented video segment is saved as an independent media file, with its ASR-transcribed text serving as the text description of that segment. This text is also used to generate vectors through an embedding model and is stored in association with the video segment's metadata (timestamp, file path).
[0125] For chart-type content, including illustrations, data visualization charts, flowcharts, etc., the core information is contained in the visual elements and their layout relationships. Relying solely on text descriptions would result in the loss of a significant amount of semantic information. This module employs a multimodal large model (such as CLIP) to extract the joint representation of the chart's visual features and text descriptions. Specifically, for each chart, the module extracts a visual feature vector vimg using CLIP's image encoder, and simultaneously extracts the text descriptions contained within the chart (such as titles, captions, and axis labels). It then generates a text feature vector vtxt using CLIP's text encoder. The two are then weighted and fused (e.g., vchart = α·vimg + (1-α)·vtxt, where α is an empirical value of 0.6) to form the chart's joint representation vector. This processing method enables cross-modal semantic matching during subsequent retrievals, whether searching for charts by text descriptions or searching for related texts by chart content.
[0126] All the vectors and metadata of the aforementioned content fragments are uniformly stored in the Milvus vector database. As an open-source vector database, Milvus supports efficient storage and retrieval of massive vectors. Its built-in HNSW (Hierarchical Navigable Small World) index structure can achieve near-nearest neighbor retrieval in milliseconds with millions of vectors. The HNSW algorithm refines the retrieval results layer by layer from top to bottom by constructing a multi-layered graph structure, achieving a good balance between retrieval accuracy and efficiency. Simultaneously, the module also establishes a keyword inverted index based on Elasticsearch, supporting traditional retrieval methods such as Boolean queries and phrase matching, complementing vector retrieval.
[0127] When a content production request is received, the module initiates the RAG (Retrieval-Augmented Generation) workflow. Content production requests can come from two sources: first, the key knowledge domains specified by the dynamic learning system generation module. For example, when the system generation module identifies that a company needs to strengthen its "digital marketing" capabilities, it will send a request to the content production module to produce knowledge units in that domain; second, the immediate needs identified by the personalized recommendation engine. For example, when the recommendation engine detects that an employee is having difficulty handling customer complaints, it may trigger an immediate production request for knowledge units such as "objection handling scripts".
[0128] The first step in request processing is to convert the request content into a query vector. The request content may be a natural language description of the requirements or a structured knowledge domain identifier. Regardless of the form, a query vector is generated using the same embedding model as the content fragment, ensuring that the query vector is consistent with the semantic space of the vectors in the library.
[0129] The second step involves performing an approximate nearest neighbor search in the vector database to recall the top K content fragments most semantically relevant to the request. The retrieval process employs the HNSW algorithm, which significantly reduces computational cost while maintaining recall by using a coarse-to-fine search path within a graph structure. However, relying solely on semantic similarity may lead to overly concentrated recall results from a single source or perspective—for example, if the request pertains to "leadership," dozens of recalled fragments might come from different chapters of the same classic work, ignoring complementary perspectives from other relevant books or case studies. To address this issue, a maximum marginal relevance (MMR) mechanism is introduced during the retrieval process. The core idea of the MMR algorithm is to balance relevance and diversity, formally represented as:
[0130]
[0131] Where R is the initial set of candidate segments retrieved, S is the set of selected segments, Q is the query vector, Sim1(Di, Q) is the similarity between segment Di and the query, Sim2(Di, Dj) is the similarity between segment Di and the selected segment Dj (in practice, Sim1 and Sim2 can use the same cosine similarity function), and λ is the balancing parameter (usually set to 0.5-0.7). The algorithm iteratively selects segments from the candidate set that are relevant to the query and as dissimilar as possible from the selected segments to add to the result set, thereby ensuring that the recall results cover different content sources and perspectives.
[0132] The recalled original fragments enter the content reorganization and rewriting stage. This stage employs a multi-agent collaborative architecture, where multiple specialized agents collaborate to transform the original content into workplace knowledge units. The multi-agent architecture is designed based on the idea of complex task decomposition: the comprehensive task of "content rewriting" is broken down into relatively independent sub-tasks such as logical analysis, terminology explanation, and format adaptation, which are handled by specialized agents respectively. Finally, a coordination mechanism integrates and outputs the results. This division of labor improves the quality and consistency of the generated content.
[0133] The first agent is responsible for analyzing the logical structure and core arguments of the original text fragment. This agent receives the original text fragment as input and extracts its text structure using the reading comprehension capabilities of a large language model. This structure includes the introduction, statement of arguments, development of evidence, case analysis, and conclusion. Based on this, the agent outputs a summary skeleton, presenting the logical flow of the fragment in Markdown list format. For example, for an article fragment discussing "situational leadership," the summary skeleton might include: a definition of the core concept (the connotation of situational leadership), the applicable conditions for the four leadership styles, a classic case (an example of a company successfully applying situational leadership), and practical suggestions. This summary skeleton retains the core information of the original text while removing redundant embellishments, providing a structured foundation for subsequent rewriting.
[0134] The second agent is responsible for identifying key terms and concepts in the passage and supplementing them with explanatory content. This agent first extracts technical terms from the original text using a terminology recognition model (such as a sequence labeling model based on BiLSTM-CRF), such as "SWOT analysis," "OKR," and "Agile development." Then, for each identified term, the agent generates an explanation in two ways: if the term already has an explanation entry in the internal knowledge base, it is directly referenced; if the term is new, a concise explanation is generated by querying external knowledge sources (such as Wikipedia or a specialized dictionary) or calling a large language model. The explanation is limited to 50 characters to meet the user's need for quick understanding in workplace learning scenarios. These terminology explanations will be presented in the form of "knowledge point hints" or "glossary entries" in subsequently generated knowledge units, helping users understand core concepts without interrupting their learning flow.
[0135] The third agent is responsible for adapting the rewritten content to a specified workplace learning format. The input to this agent is the structured content processed by the first two agents (including a summary skeleton and explanations of core terminology), and the output is formatted content tailored to the specific learning format. The learning format is specified by the content production request and can take various forms such as micro-lesson scripts, case studies, script libraries, mind maps, and simulation exercises. Different learning formats require different content organization and expression methods.
[0136] For the micro-lesson script, the intelligent agent organizes the content into a linear structure of "introduction - explanation of knowledge points - case analysis - summary and review", and inserts interactive questions at key nodes.
[0137] For the case study set, the agent extracts typical scenarios from the original text, reorganizes the case narrative according to the framework of "background-challenge-solution-effect-inspiration", and extracts reusable principles.
[0138] For the script library, the intelligent agent transforms the communication strategies in the original text into specific dialogue templates, such as "When a customer raises a price objection, the following scripts can be used...", and marks the applicable scenarios and usage precautions.
[0139] For mind maps, the agent converts the hierarchical structure of the content into an indented list in Markdown format, making it easier to import into mind mapping tools later.
[0140] These three agents collaborate through shared context memory: the summary skeleton output by the first agent serves as the input context for the second agent; the terminology explanations provided by the second agent then serve as reference information for the third agent. A conflict resolution mechanism is also in place; for example, when different agents disagree on the same concept, the final version is determined through voting or priority rules. Compared to a single end-to-end generative model, this multi-agent collaborative architecture provides better control over the generation process, ensuring that the output content meets high standards in terms of logic, knowledge accuracy, and formal adaptability.
[0141] During the content generation process, the module incorporates a knowledge tracing mechanism to ensure that each generated knowledge unit can be traced back to its original copyrighted content source. This mechanism is designed based on a dual consideration of copyright compliance and content credibility: on the one hand, the system must strictly adhere to the copyright agreement with CITIC Publishing Group to ensure that secondary creations do not deviate from the control of the original copyright; on the other hand, users may wish to access more details of the original materials during their learning process, and the tracing mechanism supports this in-depth reading need.
[0142] In practice, the knowledge tracing mechanism runs through the entire content generation process. During the content segmentation stage, each content fragment retains complete metadata during storage, including document ID, version number, chapter position, paragraph index, start character position, end character position, copyright identifier, etc. When an agent generates a knowledge unit based on a fragment, this metadata is recorded in the metadata field of the generated content, forming a pointer from the derived content to the original source.
[0143] In addition, the module employs digital watermarking technology to embed copyright information into the generated content, preventing its illegal copying and dissemination. For video content, a frequency domain watermarking technique based on DCT (Discrete Cosine Transform) is used: first, the video frame image is divided into 8×8 blocks, and a DCT transform is performed on each block to obtain frequency domain coefficients; then, watermark information is embedded in selected intermediate frequency coefficients, and 0 / 1 bits are encoded by fine-tuning the coefficient values; finally, an inverse DCT transform is performed to restore the image. This frequency domain watermarking technique has good robustness and can resist common processing operations such as compression, cropping, and scaling, while the slight changes in coefficients are imperceptible to the human eye. For text content, a semantic watermarking technique based on synonym replacement is used: by defining synonym replacement rules, watermark bits are encoded by selecting specific versions of synonyms without changing the semantics of the sentence. For example, if bit 0 is required, all instances of "important" are replaced with "key"; if bit 1 is required, it is replaced with "core". This watermarking method does not affect the readability of the text, but copyright ownership can be verified by statistically analyzing the frequency of specific words.
[0144] The generated workplace knowledge units must undergo quality assessment before being added to the database to ensure they meet workplace learning application standards. The quality assessment employs a multi-dimensional automated evaluation system, covering three core dimensions: content fidelity, readability, and scenario relevance.
[0145] Content fidelity assessment evaluates whether the generated derivative content accurately reflects the core information of the original fragment, avoiding the "illusion" problem in the AI generation process. The assessment method employs bidirectional alignment based on semantic similarity: on one hand, it calculates the vector similarity between the generated content and the original fragment; on the other hand, it calculates whether key entities (such as technical terms, data, and case names) in the generated content have corresponding counterparts in the original fragment. If the similarity is too low or key entities are missing, the content is deemed distorted and needs to be regenerated or manually reviewed. The fidelity threshold is dynamically adjusted according to the content type. For case analysis content, a higher degree of re-creativity is allowed, with a fidelity threshold set at 0.7; for content defining principles, strict adherence to the original text is required, with a threshold set at 0.9.
[0146] Readability assessment determines whether the generated content is suitable for the target user group's reading and comprehension. The Flesch-Kincaid level is used as the primary indicator, which calculates the text's difficulty level based on average sentence length and average syllable count, corresponding to US grade levels. For workplace learning scenarios, a readability level between grades 8 and 10 (i.e., junior high school graduation to the first year of high school) is generally required to ensure the content is easy to understand. Simultaneously, a terminology density indicator monitors the frequency of technical terms in the generated content. Excessive terminology density may hinder comprehension, requiring the agent to add explanatory content.
[0147] The scenario matching evaluation assesses the strength of the association between the generated content and the target business scenario. This evaluation is based on the cosine similarity between the content vector and the enterprise business scenario vector: First, the target scenario (such as "sales negotiation", "project management", "customer service") marked in the content production request is converted into a scenario vector through an embedding model; then, the similarity between the generated content vector and the scenario vector is calculated. If the similarity is too low, it indicates that the generated content may deviate from the expected scenario, and the generation parameters need to be adjusted or the retrieval source needs to be changed.
[0148] Knowledge units that pass the quality assessment are ultimately stored in the knowledge unit repository, with their vector indexes and metadata updated synchronously for use by the personalized recommendation engine. For content that fails the assessment, the system automatically records the reason for failure and triggers a backup process—this might involve re-retrieving the content fragment, adjusting generation parameters for another attempt, or transferring the task to a manual review queue. This automated quality control mechanism ensures a minimum quality standard for the knowledge units stored in the repository, maintaining the overall credibility of the system.
[0149] The core innovation of this module lies in transforming traditional "manual course development" into "AI-driven secondary creation of knowledge." In the traditional model, developing a workplace micro-course based on copyrighted content requires multiple stages, including content selection, outline design, case writing, and review and revision, typically taking weeks or even months. This module, however, compresses this process to the hour level through multimodal preprocessing, RAG retrieval, multi-agent collaborative generation, and automated quality assessment, significantly improving content production efficiency. More importantly, this production method achieves a deep coupling between authoritative knowledge assets and workplace application scenarios—it's no longer simply digitizing book content, but rather transforming high-value theories into actionable work guidelines, imitable communication templates, and reflective case scenarios tailored to the specific needs of workplace learning, truly realizing the leap from "readable" to "usable" knowledge.
[0150] Personalized recommendation engine
[0151] In one specific implementation, the personalized recommendation engine employs an improved deep semantic model architecture, extending the DSSM (Deep Structured Semantic Model) dual-tower model to achieve deep interaction and accurate matching between user features and knowledge unit features. The module's operation involves a complete technical chain, from multi-source user feature collection, dynamic weighted encoding, dual-tower similarity calculation, to two-stage recall ranking and cold start processing. Its core lies in solving the problem of accurate recommendation tailored to individual needs and changing requirements in workplace learning scenarios.
[0152] At the user feature processing level, the module constructs a multi-source user profiling system, integrating user feature information from multiple data sources. Static features mainly come from structured data in the enterprise HR system, including job level, department, years of service, historical learning completion rate, and obtained certifications. These features are relatively stable, reflecting the user's basic attributes and long-term ability level. Dynamic behavioral features come from real-time interaction logs between users and the system, including vectorized representations of search keywords in the past 7 days, encoded results of clickstream sequences, distribution of video viewing dwell time, and exit positions of incomplete knowledge units. These features reflect the user's recent learning interests and behavioral patterns, and are highly time-sensitive. Enterprise context features are synchronized in real time from the enterprise profiling module, including vectors of the current enterprise strategic direction, semantic embedding of recent business pain points, and organizational capability gap tags, ensuring that the recommendation results can integrate individual learning needs with the overall strategic direction of the enterprise.
[0153] Specifically, for modeling user behavior sequences, the module employs a time-decay weighting function to assign decreasing weights to historical behaviors. This design is based on the fact that employees' learning needs and skill gaps change dynamically over time—research shows that user interests exhibit time-drift characteristics, and early learning records have limited reference value for current recommendations, while recent behavior better reflects current interest levels. In the specific implementation, time-decay weights are assigned to users' historical interaction behaviors:
[0154]
[0155] Here, Δt represents the time interval between the action and the current action, and λ is the decay coefficient, controlling the decay rate. The value of λ is optimized based on the characteristics of the learning scenario: for domains with rapidly updating skills (such as digital marketing and programming technology), a larger decay coefficient is set to make the model focus more on recent behaviors; for domains of fundamental knowledge (such as leadership and communication skills), a smaller decay coefficient is set to appropriately retain the reference value of historical information. Through this time decay mechanism, the contribution of earlier interactions in the user behavior sequence to the current recommendation is exponentially reduced, while recent search, click, and view behaviors receive higher weights, enabling the recommendation results to respond promptly to dynamic changes in user interests.
[0156] At the knowledge unit feature processing level, the module integrates the knowledge unit vectors produced by the multimodal content extraction module, as well as the metadata of that knowledge unit. The knowledge unit vectors are generated by the multimodal content extraction module, and their dimensions are consistent with the user feature vectors to facilitate subsequent similarity calculations. The metadata includes: ability tags (related ability IDs, such as "data analysis" or "team incentives"), difficulty level (beginner / intermediate / advanced), estimated learning time (in minutes), associated job positions (applicable to which positions), and positive review rate (based on feedback statistics from historical learners). This metadata provides a rich foundation of cross-features for subsequent refined ranking.
[0157] The dual-tower model encodes user features and knowledge unit features separately. Its basic idea is to construct independent sub-network tower structures for users and items, train them using interactive data, and ultimately obtain the embeddings on both the user and item sides. Recommendations are then made by calculating the similarity between the two. The advantage of this structure is that the embeddings for user and item sides can be pre-calculated offline, and only fast vector retrieval is needed during online service, meeting the latency requirements of large-scale real-time recommendations.
[0158] However, the classic dual-tower model has an inherent flaw: user-side features and item-side features only interact during the final inner product calculation; before that, the two towers operate completely independently. This results in a significant amount of fine-grained feature interaction information not being fully utilized, sacrificing some of the model's accuracy. To address this issue, this module introduces a dynamic attention feature interaction layer during the encoding process.
[0159] Specifically, for multiple feature domains in user features (such as job title feature domain, behavioral feature domain, and enterprise context feature domain), the SENet (Squeeze-and-Excitation Networks) structure is used to calculate the global importance weights of each feature domain, and the weighted features are then input into subsequent fully connected layers. SENet was initially proposed by the Momenta team for image processing, achieving feature recalibration by explicitly modeling the interdependencies between feature channels. In recommender systems, SENet has been introduced for feature selection—suppressing noise or invalid low-frequency features with small weights and amplifying the influence of important features with large weights.
[0160] The application of SENet in this module involves three steps: Squeeze, Excitation, and Reweight.
[0161] The Squeeze operation compresses the embedding vector of each feature domain into a scalar using global average pooling. This step summarizes the global information of each feature domain. For the embedding vector e of the i-th feature domain... i (Dimension k), the compressed scalar z i for:
[0162]
[0163] This compression operation allows each feature domain to be characterized by a real number representing its overall response strength. This real number has a global receptive field and can reflect the degree of activation of the feature domain in the whole.
[0164] The excitation operation learns the weights of each feature domain through two fully connected layers. This step is similar to the gating mechanism in recurrent neural networks, explicitly modeling the correlation between feature domains through parameter learning. The specific calculation formula is as follows:
[0165]
[0166] Here, Z is the compressed vector output by the Squeeze stage (dimension f, i.e., the number of feature domains); W1 is the weight matrix of the first fully connected layer (dimension (f / r) × f, i.e., the number of rows is f / r and the number of columns is f); r is a scaling parameter (usually 16) used to reduce the number of parameters and computational cost; the ReLU activation function introduces non-linearity; W2 is the weight matrix of the second fully connected layer (dimension f × (f / r), i.e., the number of rows is f and the number of columns is f / r), restoring the dimension to f; σ2 is the Sigmoid activation function, outputting the importance weight A of each feature domain, with a value between 0 and 1. The essence of this process is: through the combination of two fully connected layers, the feature domains are correlated with each other, dynamically determining which feature domains are more important in the current context.
[0167] The reweight operation multiplies the weights learned in the excitation phase with the original features, thus completing the feature recalibration:
[0168]
[0169] Where ai is the weight of the i-th feature domain, ei is the original embedding, and the weighted feature V is used as the input to the subsequent fully connected layer. After recalibration, important feature domains are amplified, while unimportant feature domains are suppressed, achieving adaptive feature selection.
[0170] The rationale behind this dynamic attention feature cross-layer design lies in the significant differences in the feature domains that dominate recommendation decisions across different scenarios. For example, in the early stages of a new employee's employment, the weights of job and departmental features should be higher than historical behavioral features, as sufficient interaction data is lacking at this time. During a business sprint, the weights of corporate strategic features (such as the current key business direction) should be higher than personal interest features, ensuring that training resources are tilted towards strategic priorities. For senior employees, their historical behavioral patterns may better reflect their true professional development direction. The SENet structure can adaptively adjust the weight distribution of each feature domain based on the input data, enabling the model to focus on the most relevant decision information in different contexts, thereby improving the accuracy of recommendations.
[0171] After encoding, the user vectors output by the user tower and the knowledge unit vectors output by the knowledge unit tower are compared using cosine similarity calculation to generate a matching score. To improve model performance, the module also introduces two key technologies: normalization and temperature coefficient.
[0172] First, L2 normalize the embeddings output from the user side and the item side:
[0173]
[0174] The result of L2 normalization of user vectors (defined normalized user vectors); The result of L2 normalization of the knowledge unit vector (the defined normalized knowledge unit vector); : L2 norm (magnitude) of user vectors; L2 norm (magnitude) of knowledge unit vectors;
[0175] The main reason for normalization is that the original vector dot product does not satisfy the triangle inequality, which may produce counterintuitive ranking results in non-metric spaces; while the normalized vector dot product is equivalent to cosine similarity, which is transformed into Euclidean distance, making the vector space have better metric properties.
[0176] Secondly, after calculating the inner product of the normalized vectors, divide by the temperature coefficient τ, u: user feature vector output by the user tower; v: knowledge unit feature vector output by the knowledge unit tower; s(u,v): the final matching score expression is:
[0177]
[0178] The role of the temperature coefficient is to adjust the smoothness of the similarity distribution—a smaller τ makes the distribution after softmax sharper, with the model focusing more on the most similar positive and negative samples; a larger τ makes the distribution smoother, with more even attention to all samples. By adjusting τ, the convergence effect and final performance of model training can be optimized.
[0179] The model training employs a binary cross-entropy loss function, using completed knowledge units from the user's history as positive samples and randomly sampled non-interactive units as negative samples. The training objective is to minimize the prediction error between positive and negative samples, maximizing the similarity of positive sample pairs and minimizing the similarity of negative sample pairs.
[0180] To address the cold start problem in recommendation systems, the module employs a dedicated mechanism. The cold start problem primarily occurs in two scenarios: first, new employees joining the company, lacking historical interaction data; and second, newly launched knowledge units, which have not yet accumulated user feedback. For new employees or users with sparse historical behavior, the module uses a knowledge graph-based embedding method.
[0181] Specifically, the process begins by constructing a job knowledge graph. Nodes include entities such as job title, department, skills, and certifications, while edges represent relationships between entities (e.g., "Data analysts need to master SQL skills"). The new employee's attribute characteristics (job title, department, and start date) are mapped to the knowledge graph space. A graph neural network is then used to aggregate neighbor node information, generating the employee's knowledge graph embedding vector. Next, the knowledge graph searches for the group of experienced employees most similar to this employee—by calculating the similarity of their embedding vectors, senior employees with "similar job titles, departments, and skill requirements" are identified. An initial recommendation list is generated based on the group preferences of these similar employees (the knowledge units they have completed, highly rated content, and learning paths).
[0182] Based on this, the initial list is reordered according to the company's current strategic priorities. For example, if the company's current strategic priority is "digital transformation," then knowledge units related to digitalization are recommended first; if it is in the "new product launch" stage, then product knowledge-related content is recommended first. This reordering mechanism ensures that the recommendations in the cold start phase follow both "similar group experience" and "company strategic orientation," achieving an initial integration of individual and organizational needs.
[0183] The recommendation results are not generated as a one-time static output, but rather employ a two-stage filtering strategy, which is the standard architecture for industrial-grade recommendation systems. The first stage uses a dual-tower model for rapid recall, retrieving the top hundreds of candidate knowledge units most relevant to the user from a massive knowledge unit database. This stage prioritizes both speed and breadth—filtering the candidate set as quickly as possible while maintaining recall. The advantage of the dual-tower model in this stage is that the embeddings on the user and item sides can be pre-computed offline; during online service, only vector retrieval is required, with latency controllable to the millisecond level.
[0184] The second stage employs a ranking model to refine the candidate set. The ranking model utilizes deep learning models capable of fully modeling feature interactions, such as DeepFM and xDeepFM. Compared to the dual-tower model used in the recall stage, the ranking model can incorporate richer cross-features, such as: the matching degree between user job title and knowledge unit difficulty level (pushing overly difficult content to entry-level positions may lead to frustration), the fit between user's historical learning time distribution and knowledge unit duration (pushing short content during fragmented time), and the correlation between user's past learning completion rate and knowledge unit type (prioritizing videos with high completion rates). By modeling these high-order feature interactions, the ranking model accurately ranks the candidate set, ensuring that the top few knowledge units pushed in the final batch are highly matched to the user across multiple dimensions.
[0185] The push notifications support multiple devices, including PC web pages, mobile apps, and embedded applications within WeChat Work, and can be triggered based on the user's real-time status. For example, when the system detects via API that a user is handling a customer complaint in the CRM system, it can instantly push relevant "objection handling scripts" micro-courses; when a user creates a new project in the project management software, it can push a "project kick-off meeting process" case study set. This real-time triggering mechanism based on business scenarios embeds learning into the workflow, achieving real-time support for "learning on demand."
[0186] Furthermore, the module incorporates a dynamic feedback optimization mechanism based on reinforcement learning. The system continuously tracks user feedback behavior on pushed content (clicks, completions, favorites, shares, subsequent performance changes, etc.) and dynamically adjusts the recommendation strategy through reinforcement learning algorithms. For example, if a certain type of knowledge unit is found to significantly improve the performance of employees in a specific role, the recommendation weight of similar content will be increased for other employees in that role; if the bounce rate of a certain type of content is too high, its recommendation priority will be reduced or a content quality review will be triggered. This closed-loop mechanism of "recommendation-feedback-optimization" enables the recommendation system to continuously learn and evolve, constantly improving its recommendation effectiveness.
[0187] In summary, the personalized recommendation engine achieves deep personalized recommendations through multiple technical means, including multi-source feature fusion, dynamic attention weighting, two-stage recall and ranking, cold-start knowledge graph, and real-time scene triggering. Compared to traditional tag-based recommendation systems, this solution significantly improves feature interaction depth (from top-level inner product to bottom-level feature cross-referencing), context awareness (from static profiles to dynamic business scenarios), and cold-start processing (from zero-shot to knowledge graph transfer), truly achieving a learning support effect of "personalized recommendations for each user and adaptable to changing needs."
[0188] The four modules described above work collaboratively through standardized data interfaces and message queues: the enterprise profile building module outputs profile data that is synchronized to the knowledge graph database in real time; the dynamic learning system generation module listens for profile update events and automatically triggers system reconstruction; the multimodal content extraction module prioritizes the production of scarce knowledge units based on system changes; and the personalized recommendation engine integrates real-time user behavior with strategic guidance to achieve precise push notifications. The entire system is deployed using Kubernetes-based containers, supporting elastic scaling to handle peak recommendation requests, and a complete monitoring and logging system has been established to ensure high availability and observability.
[0189] Example 2
[0190] In one specific implementation, an AI-based personalized workplace learning method for enterprises is implemented based on the above system. The method includes the following steps, with a closed-loop data flow and feedback optimization mechanism formed between each step.
[0191] Step S1: Construct a digital profile of the enterprise that includes industry, business, and culture.
[0192] This step transforms the multidimensional characteristics of a complex organization like an enterprise into a machine-understandable digital representation through the collection, cleaning, semantic analysis, and feature fusion of multi-source heterogeneous data. First, a multi-source data collection channel is established: at the external data level, web crawling technology is used to capture research reports, policy documents, competitor activities, and market sentiment related to the enterprise's industry in real time. A distributed crawling system based on the Scrapy framework is used during the collection process, with a compliant access policy based on the Robots Exclusion Protocol. Appropriate User-Agent identifiers are set in the request headers, and random delays are added between requests to avoid excessive pressure on the target server. At the internal data level, an encrypted connection is established with the enterprise database through a JDBC connection pool to extract structured and unstructured data such as organizational structure, job descriptions, performance appraisal results, business process SOP documents, anonymized internal communication records, and corporate culture manuals.
[0193] The collected raw data first enters the data cleaning and preprocessing unit, where rule-based data cleaning algorithms are used to remove noise: for numerical data, the rules include range verification, missing value handling, and outlier detection; for text data, HTML tags and special symbols are removed using regular expressions, Chinese word segmentation is performed using the jieba word segmenter combined with a custom enterprise terminology dictionary, stop words are removed, and stemming is performed on English terms.
[0194] The preprocessed data is input into a multimodal semantic analysis engine, which adopts a hybrid architecture design to adapt to the semantic characteristics of different data forms: For long texts such as industry research reports and policy documents, a pre-trained BERT-large model is used for paragraph-level semantic encoding. This model has been pre-trained on the Wikipedia Chinese corpus and the financial and economic corpus to accurately capture the semantic relationships between industry terms; For short texts such as corporate culture manuals and value keywords, the Sentence-BERT model is used to generate sentence-level semantic vectors. This model introduces a contrastive learning loss function during training, making sentences with similar semantics closer in the vector space; For business process SOP documents, a graph neural network model is introduced, using process nodes as graph nodes and the temporal and logical relationships between nodes as edges to construct a knowledge graph representation of the business process, thereby preserving the structural topological information of the process in the vector space.
[0195] To address the semantic alignment issue among multi-source heterogeneous data, a cross-modal alignment layer is established. Through a contrastive learning mechanism, industry feature vectors, business capability graph vectors, and cultural value dimension vectors are mapped to a unified semantic space, ensuring joint inference of enterprise features from different modalities. In the feature fusion stage, a hierarchical attention mechanism is employed to dynamically weight and fuse the aforementioned multi-source features: first, three feature subspaces are constructed, and a multi-head self-attention mechanism is introduced to calculate the interaction weights between different feature subspaces. The importance weights of each feature domain are learned through the SENet (Squeeze-and-Excitation Networks) structure. The weighted features are then concatenated to form the final multi-dimensional digital profile of the enterprise. This profile is stored in a vector database in vector form and includes interpretable tagged representations, such as "Industry = High-end Equipment Manufacturing," "Development Stage = Rapid Growth Stage," "Culture Orientation = Results-Oriented," and "Core Competency Weakness = Digital Marketing." The reason for adopting this hierarchical attention fusion mechanism is that the industry attributes, business models and cultural characteristics of different enterprises have significantly different influence weights on the construction of the learning system. The attention mechanism can adaptively adjust the weight allocation according to the input data, thereby ensuring the accuracy and adaptability of the profile representation.
[0196] Step S2: Based on the large language model, generate a customized workplace learning system according to the enterprise profile.
[0197] This step uses the multi-dimensional digital profile of the enterprise output from step S1 as the core input, combined with a pre-set general workplace competency framework, to generate a personalized enterprise learning system through the reasoning ability of a large language model. First, the enterprise digital profile is converted into structured prompts. The prompts are designed using chain-of-thought technology to guide the large model to complete reasoning step by step: First, based on the industry characteristics in the profile, typical business scenarios and core positions in the industry are identified; second, based on the business competency map in the profile, the current competency weaknesses and strengths of the enterprise are identified; third, based on the cultural values dimension in the profile, the value orientation that the learning content should emphasize is determined; fourth, by synthesizing the above analysis, a competency model, a job learning path map, and key knowledge domains are generated.
[0198] In actual operation, a Retrieval Augmentation (RAG) architecture is adopted to improve the professionalism and accuracy of the generated content. Before inputting the prompts into the large language model, vector similarity retrieval is used to recall the most relevant reference cases to the current corporate profile from the enterprise knowledge base. These cases are then used as contextual information and concatenated into the prompts. The enterprise knowledge base includes an industry best practices library, a historical training record library, and a performance improvement case library. These knowledge assets are all vectorized and stored in a vector database shared with step S1. The retrieval process employs a dual recall strategy to balance relevance and diversity: first, a global similarity retrieval is performed based on the enterprise profile vector to recall the top K cases most similar to the current enterprise in terms of industry attributes, development stage, and capability shortcomings; then, a targeted retrieval is performed based on specific dimensions in the profile (such as cultural values) to ensure that the recalled cases match the target enterprise in terms of cultural orientation.
[0199] The generated learning system is output in JSON-LD format and contains a multi-level structure: the top layer is the competency domain (e.g., "Leadership," "Professional Skills," "General Skills"), the middle layer is the competency items (e.g., "Team Motivation," "Data Analysis," "Communication and Expression"), and the bottom layer is the competency indicators and their corresponding learning objectives. The job learning path diagram constructs a mapping relationship between job, competency, and learning content, stored in a directed graph structure. Nodes represent learning units, and edges represent pre- and post-learning relationships and dependencies. Key knowledge domains are sets of knowledge points extracted from the competency model, labeled with metadata such as the relevance of each knowledge point to business scenarios, difficulty level, and estimated learning time. This step also includes a system evolution mechanism, periodically or based on event triggers to regenerate the process, incorporating the latest business data from the enterprise to achieve dynamic iteration of the learning system.
[0200] Step S3: Based on RAG and AIGC technologies, extract and generate high-concentration workplace knowledge units from the proprietary copyright library.
[0201] This step involves transforming high-quality knowledge resources from CITIC Publishing's proprietary copyrighted content library into lightweight knowledge units suitable for workplace learning. First, a vectorized index of the copyrighted content is constructed: for ebooks, a paragraph-level segmentation strategy is used, comprehensively considering chapter boundaries, paragraph length, and semantic integrity to divide the text into segments of approximately 512 words; for audio and video content, ASR (Automatic Speech Recognition) technology is used to convert speech into text, and scene detection algorithms are employed to segment semantically complete segments; for charts and graphs, a multimodal large model (such as CLIP) is used to extract joint representations of visual features and textual descriptions. All content segments generate high-dimensional vectors through an embedding model and are stored in the Milvus vector database, while an inverted index is built to support keyword retrieval.
[0202] Upon receiving a content production request (which can originate from the key knowledge domain specified in step S2 or from an immediate need identified in subsequent step S4), the RAG workflow is initiated. First, the requested content is converted into a query vector using the same embedding model. Then, an approximate nearest neighbor search (using the HNSW algorithm) is performed in the vector database to recall the top K content fragments most semantically relevant to the request. This retrieval process considers not only semantic similarity but also introduces a diversity penalty factor (MMR, Maximal Marginal Relevance mechanism) to prevent the recalled results from being overly concentrated in a single source or a single perspective.
[0203] The recalled original fragments enter the content restructuring and rewriting stage, which adopts a multi-agent collaborative architecture: one agent is responsible for analyzing the logical structure and core arguments of the original fragments and outputting a summary skeleton; another agent is responsible for identifying key terms and concepts in the fragments, extracting professional terms through a terminology recognition model, and supplementing explanatory content from internal knowledge bases or external knowledge sources; a third agent is responsible for adapting the rewritten content to the specified workplace learning format (micro-course scripts, case studies, script libraries, mind maps, etc.), and organizing and expressing the content differently for different learning formats.
[0204] During content generation, a knowledge tracing mechanism is introduced to ensure that each generated knowledge unit can be traced back to its original copyrighted content source. Metadata such as the original fragment's document ID, paragraph position, and copyright information are preserved during generation, and this metadata is embedded in the generated content's file attributes or digital watermark: for video content, frequency domain watermarking technology based on DCT transform is used; for text content, semantic watermarking technology based on synonym replacement is used, achieving copyright tracking without affecting the reading experience. Generated workplace knowledge units undergo quality assessment before being added to the database. Assessment dimensions include content fidelity (semantic similarity to the original fragment), readability (Flesch-Kincaid level), and scenario matching (strength of relevance to the target business scenario). The core value of this step lies in transforming traditional "manual course development" into "AI-driven secondary creation of knowledge," achieving deep coupling between authoritative knowledge assets and workplace application scenarios.
[0205] Step S4: Combining employee tags, the knowledge units are personalized and pushed to employee terminals through a recommendation model.
[0206] This step employs an improved deep semantic model architecture, extending the DSSM (Deep Structured Semantic Model) dual-tower model to achieve deep interaction and precise matching between user features and knowledge unit features. At the user feature processing level, a multi-source user profile system is constructed: static features include structured data from the HR system such as job title, job level, years of service, historical learning completion rate, and certification qualifications; dynamic behavioral features include vectorized representations of search keywords from the past 7 days, LSTM encoding results of clickstream sequences, distribution of video viewing dwell time, and exit positions of incomplete knowledge units, among other real-time behavioral data; enterprise context features are synchronized in real-time from step S1, including the current enterprise strategic direction vector, semantic embedding of recent business pain points, and organizational capability gap labels. For modeling user behavior sequences, a time-decay weighting function is used to assign decreasing weights to historical behaviors, making recent behaviors contribute more to recommendations. This design is based on the fact that employees' learning needs and capability gaps change dynamically over time, and early learning records have limited reference value for current recommendations.
[0207] At the knowledge unit feature processing level, the knowledge unit vector produced in step S3 is integrated with the meta-information of that knowledge unit (ability tag, difficulty level, estimated learning time, associated job, positive review rate, etc.). The dual-tower model encodes user features and knowledge unit features separately, introducing a dynamic attention feature cross-layer during the encoding process: for multiple feature domains in the user features (such as job feature domain, behavioral feature domain, and enterprise context feature domain), the SENet structure is used to calculate the global importance weight of each feature domain, and the weighted features are input into the subsequent fully connected layer. The rationale for this design lies in the fact that the feature domains that dominate recommendation decisions differ in different scenarios, and the attention mechanism can adaptively adjust the importance allocation of features according to the context.
[0208] After encoding, the user vector output by the user tower and the knowledge unit vector output by the knowledge unit tower are compared using cosine similarity calculation to generate a matching score. Model training employs a binary cross-entropy loss function, using completed knowledge units from the user's history as positive samples and randomly sampled non-interactive units as negative samples. To address the cold start problem in recommendation systems, a dedicated processing mechanism is implemented: for new employees or users with sparse historical behavior, a knowledge graph-based embedding method is used to map the user's attribute features to the job knowledge graph space, finding the most similar mature employee group, generating an initial recommendation list based on group preferences, and then reordering the initial list in conjunction with the company's current strategic priorities.
[0209] The recommendation results are generated using a two-stage screening strategy: the first stage uses a dual-tower model to quickly recall a candidate set (hundreds of candidates), and the second stage uses a ranking model (such as DeepFM) to refine the ranking of the candidate set. The ranking model introduces more cross-features, such as the matching degree between the user's job position and the difficulty level of the knowledge unit, and the fit between the user's historical learning time distribution and the duration of the knowledge unit, to ensure that the final content pushed meets both the needs of skill development and the user's learning habits. The push format supports multi-terminal adaptation (PC, mobile, and embedded in WeChat Work) and can trigger push timing based on the user's real-time status. For example, when the system detects that the user is handling customer complaints in the CRM system, it can immediately push relevant dispute handling micro-courses. This step also sets up a dynamic feedback optimization mechanism based on reinforcement learning to continuously track the user's feedback behavior on the pushed content and dynamically adjust the recommendation strategy to form a closed loop of "recommendation-feedback-optimization".
[0210] The four steps above are executed sequentially in the order of S1→S2→S3→S4, forming a continuously optimized closed loop through a data feedback mechanism: user learning behavior data generated in S4 is fed back to S1 to update the employee capability dimension in the enterprise profile, back to S2 to optimize the dynamic adjustment of the learning system, and back to S3 to guide the prioritization of subsequent content production. Through this closed-loop mechanism, the enterprise workplace learning system achieves continuous collaborative evolution with enterprise strategy and employee needs.
[0211] In summary, the present invention provides an AI-based personalized workplace learning system and method for enterprises. By constructing four core components—an enterprise profile building module, a dynamic learning system generation module, a multimodal content extraction and AIGC production module, and a personalized recommendation engine—it forms a complete technical closed loop from the digital representation of enterprise characteristics to the dynamic generation of the learning system, and from the reconstruction of knowledge assets to the implementation of precise recommendations. This method first constructs a multi-dimensional digital profile of an enterprise, encompassing industry characteristics, business capabilities, and cultural values, through multi-source heterogeneous data collection and multimodal semantic analysis, enabling a computable representation of this complex organization. Then, using a large language model as its core, combined with thought chain prompts and retrieval enhancement generation technologies, it automatically infers and generates a personalized learning system that matches strategic development based on the enterprise profile, overcoming the technical bottleneck of the disconnect between traditional static systems and actual enterprise needs. Building upon this, through the deep integration of RAG and AIGC technologies, it automatically extracts and reconstructs high-quality knowledge resources from authoritative copyright content libraries into lightweight knowledge units suitable for workplace scenarios, achieving secondary creation and large-scale production of knowledge assets. Finally, it employs an improved DSSM dual-tower model, introducing a dynamic attention feature cross-layer and a two-stage screening strategy, combined with multi-source user profiles and enterprise contextual features, to achieve deeply personalized and precise push notifications. Furthermore, through a cold-start knowledge graph and a real-time scenario triggering mechanism, it ensures comprehensive coverage of new employees and dynamic business needs.
[0212] Therefore, through systematic technological integration and innovation, this invention effectively solves the technical problems of existing corporate workplace learning systems, such as rigid structure, lagging content production, and single recommendation dimensions. It significantly improves the matching degree between learning content and corporate strategy, the efficiency of knowledge asset conversion, and the accuracy of personalized recommendations, providing a complete technical solution for enterprises to build an agile, intelligent, and continuously evolving talent training system in the digital age.
Claims
1. An AI-based enterprise personalized workplace learning system, characterized by, include: The enterprise profile building module is used to collect internal and external multi-source data of the enterprise and perform semantic analysis through AI models to build a digital profile of the enterprise that represents the industry characteristics, core business and cultural features of the enterprise; the dynamic learning system generation module is connected to the enterprise profile building module and is used to input the enterprise digital profile into a large language model to automatically infer and generate a personalized learning system that matches the enterprise's strategic development path. The multimodal content extraction and AIGC production module connects to a pre-set copyrighted content database. Based on the needs of the learning system, it locates copyrighted content through retrieval enhancement generation technology and uses an AIGC model to reconstruct the copyrighted content into knowledge units suitable for workplace learning. The personalized recommendation engine connects to both the dynamic learning system generation module and the multimodal content extraction and AIGC production module. It combines individual employee tags to filter and push personalized learning content from the generated knowledge units. 2.The AI-based enterprise personalized workplace learning system of claim 1, wherein The enterprise profile construction module includes: a long text semantic encoding unit, which uses a pre-trained BERT-large model to perform paragraph-level semantic encoding on industry research reports and policy documents; a short text semantic encoding unit, which uses a Sentence-BERT model combined with a contrastive learning loss function to perform sentence-level semantic encoding on enterprise culture manuals and value keywords; and a business process modeling unit, which uses a graph neural network model to construct and encode a knowledge graph of business process SOP documents. 3.The AI-based enterprise personalized workplace learning system of claim 1, wherein, The multimodal content extraction and AIGC production module specifically includes: a vectorized indexing submodule, used to convert copyrighted content into high-dimensional vectors; a semantic retrieval submodule, used to match learning needs with content fragments; and a content generation submodule, used to rewrite the original content fragments into micro-lessons, case studies, or scripts. 4.The AI-based enterprise individualized workplace learning system of claim 1, wherein The personalized recommendation engine specifically includes: The dual-tower model encoding unit is used to construct an improved DSSM (Deep Structured Semantic Model) dual-tower structure, in which the user tower and the knowledge unit tower are encoded independently. A dynamic attention feature cross-layer, embedded in the dual-tower model, is used to dynamically weight multi-domain features before user features and knowledge unit features interact. The dynamic attention feature cross-layer adopts the SENet (Squeeze-and-Excitation Networks) structure, obtains global information of each feature domain through compression operations, learns the importance weights of each feature domain through activation operations, and applies the weights to the original features. The vector matching unit is used to calculate the cosine similarity between the weighted user feature vector and the knowledge unit feature vector to generate a matching score. The loss function optimization unit is used to optimize the model parameters using the binary cross-entropy loss function to minimize the prediction error between positive and negative samples. 5.The AI-based enterprise personalized workplace learning system of claim 4, wherein, The input features of the user tower and the knowledge unit tower include: Static characteristics include job title, department, length of service, and historical learning completion rate; Dynamic behavioral characteristics include vectorized representations of search keywords within the past 7 days, LSTM encoding results of clickstream sequences, and dwell time on incomplete knowledge units; Enterprise context features include the enterprise strategic direction vector output by the enterprise profile building module and the semantic embedding of current business pain points; a temporal decay weight for giving a historical behavior feature a weight that decays over time, employing an exponential decay function where Δt is the time interval from when the behavior occurred to the present, and λ is a decay coefficient.
6. The AI-based personalized workplace learning system for enterprises according to claim 4, characterized in that, The personalized recommendation engine also includes: The feature cross-enhancement unit is used to dynamically weight multi-domain features within the user tower using the SENet structure, and then explicitly cross the weighted features. Specifically, it includes: obtaining the user feature vector after SENet weighting; performing pairwise interaction operations on the feature domains through the feature cross-layer to generate a cross feature vector; and concatenating the cross feature vector with the original user tower vector to obtain the enhanced user target vector, which is used for matching calculation with the knowledge unit vector.
7. The AI-based personalized workplace learning system for enterprises according to claim 1, characterized in that, Also includes: The data update mechanism module is configured to perform incremental updates on the personalized recommendation engine and the enterprise profile building module. The data update mechanism module includes: an incremental data acquisition unit, used to collect newly generated user behavior data, enterprise business data, and newly added copyrighted content data in real time or near real time; An update trigger judgment unit is used to determine whether to trigger a model update based on a preset update strategy, wherein the update strategy includes timed triggering and event triggering. Incremental learning units are used to incrementally train the recommendation model using a lifelong learning strategy, avoiding the computational cost of full retraining. The incremental learning unit specifically includes: a replay sample subunit, which is used to sample some representative historical data from the historical sample library, mix it with the newly added sample data, and then fine-tune the model to alleviate the catastrophic forgetting problem; The parameter regularization subunit is used to introduce the Elastic Weight Consolidation (EWC) algorithm during incremental training, which applies regularization constraints to model parameter changes and protects parameters that are important for historical tasks. The model version management subunit is used to save a snapshot of the model version after each incremental update, supporting model rollback and A / B testing.
8. The AI-based personalized workplace learning system for enterprises according to claim 7, characterized in that, The update trigger judgment unit is configured as follows: Based on the classification results of user groups and knowledge unit groups, determine whether to perform incremental updates; Specifically, this includes classifying new activity information according to user groups, knowledge unit groups, and update time thresholds; If the amount of new interaction data for a user group reaches a preset scale within the most recent update time threshold, a local incremental update of the model for that user group will be triggered instead of a global model update, in order to reduce computational overhead.
9. The AI-based personalized workplace learning system for enterprises according to claim 1, characterized in that, Also includes: The cold start recommendation unit is used for initial recommendations of new employees or employees without historical behavioral data. The cold start recommendation unit is configured to obtain the standard learning path for the new employee's position based on the new employee's job tag and department tag from the dynamic learning system generation module. By employing a knowledge graph-based embedding method, the attribute characteristics of new employees are mapped to the job knowledge graph space. The system finds the group of mature employees most similar to the user's attributes, generates an initial recommendation list based on group preferences, and reorders the initial list in conjunction with the company's current strategic priorities, prioritizing the push of knowledge units that are strongly related to the company's strategic direction.
10. An AI-based personalized workplace learning method for enterprises, implemented based on the system described in claim 1, characterized in that... Includes the following steps: S1: Construct a digital profile of the enterprise that includes industry, business, and culture, specifically including: S1.1: Multi-source heterogeneous data collection. External industry research reports, policy documents, competitor dynamics and market sentiment are collected through a distributed crawler system based on the Robots protocol. At the same time, the enterprise database is encrypted and connected through a JDBC connection pool to extract organizational structure, job descriptions, performance appraisal results, business process SOP documents, anonymized internal communication records and corporate culture manuals. S1.2: Data cleaning and preprocessing. Rule-based data cleaning algorithms are used to perform range verification, missing value handling and outlier detection on numerical data. For text data, HTML tags and special symbols are removed using regular expressions. Chinese words are segmented using the jieba word segmenter combined with a custom enterprise terminology dictionary. Stop words are removed and stemming is performed on English terms. S1.3: Multimodal semantic analysis, using a hybrid architecture design to encode the cleaned data; S1.4: Cross-modal semantic alignment, constructing a trimodal contrastive learning framework, using the industry vector, culture vector, and business vector of the same enterprise as positive sample triples, randomly sampling the modal vectors of other enterprises as negative samples, and training through the InfoNCE contrastive loss function to map the three modal features to a unified semantic space; S1.5: Hierarchical attention feature fusion introduces a multi-head self-attention mechanism to calculate the interaction weights of different feature subspaces, learns the importance weights of each feature domain through the SENet structure, and concatenates the weighted industry feature vector, business capability graph vector and cultural value dimension vector to form the final enterprise multi-dimensional digital profile vector and generate an interpretable tagged representation. S2: Based on a large language model, it generates a customized workplace learning system according to enterprise profiles, specifically including: S2.1: Prompt word engineering construction, which converts the enterprise's multi-dimensional digital profile into a structured prompt word template, adopts a three-part structure: the first part defines the task objectives, the middle part lists the key attributes of the enterprise profile in key-value pairs, and the last part clarifies the output format requirements; S2.2: Guided Reasoning Through Thinking Chains: Design a four-step reasoning process to guide the large model to complete the task step by step; S2.3: Enhanced retrieval generation. Before inputting the prompt words into the large language model, the most relevant reference cases to the current enterprise profile are retrieved from the enterprise knowledge base through vector similarity retrieval, including the industry best practice library, historical training record library and performance improvement case library. After balancing relevance and diversity with a dual recall strategy, the cases are concatenated into the prompt words as contextual information. S2.4: The learning system is output in a structured manner, using JSON-LD format to output a multi-level learning system. The top layer is the capability domain, the middle layer is the capability item, and the bottom layer is the capability indicator and learning objective. At the same time, a mapping relationship between job position, capability, and learning content is constructed, and a directed graph structure is used to store the job learning path graph. Nodes represent learning units, and edges represent pre- and post-relationships and dependencies. Key knowledge domains are extracted from the capability model, and metadata such as relevance to business scenarios, difficulty level, and expected learning time are labeled. S2.5: Dynamic system evolution, based on two mechanisms: periodic triggering and event triggering, to realize the dynamic iteration of the learning system. Periodic triggering automatically reconstructs the system according to a preset cycle, while event triggering responds to real-time events of the enterprise business system to perform local incremental updates. A version control strategy is used to manage the system iteration history, and A / B testing and system rollback are supported. S3: Based on RAG and AIGC technologies, it extracts and generates high-concentration workplace knowledge units from its own copyrighted library, specifically including: S3.1: Construct a vectorized index for copyrighted content and adopt a differentiated processing strategy for content of different modalities; S3.2: RAG retrieval and recall. After receiving a content production request, the requested content is converted into a query vector through the same embedding model. Approximate nearest neighbor retrieval is performed in the vector database. An MMR diversity penalty factor is introduced to balance relevance and diversity, and the top K content fragments that are most relevant to the request semantics are recalled. S3.3: Multi-agent collaborative content reorganization and rewriting, using a multi-agent collaborative architecture to process the original fragments; S3.4: Knowledge traceability and copyright protection. During the content generation process, metadata such as document ID, paragraph position, and copyright information of the original fragments are retained and embedded in the file attributes of the generated content. For video content, frequency domain watermarking technology based on DCT transform is used, and for text content, semantic watermarking technology based on synonym replacement is used to achieve copyright tracking. S3.5: Multi-dimensional quality assessment, which automatically evaluates the generated knowledge units; S4: Combining employee tags, knowledge units are personalized and pushed to employee terminals through a recommendation model.