A natural language-based data backup management method and device
By employing a natural language-based data backup management method, an intent vector set is constructed using semantic vector encoding and a bidirectional contrast anchor calibration mechanism. Combined with intent-tool mapping and temporal co-occurrence graph optimization, tool selection is optimized, thus solving the problems of complexity and low accuracy in existing data backup management technologies and achieving efficient and low-cost data backup management.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- SICHUAN JINGRONG SHUAN SCI & TECH CO LTD
- Filing Date
- 2026-03-27
- Publication Date
- 2026-06-30
AI Technical Summary
Existing data backup management systems are complex to operate, difficult to troubleshoot, inconvenient to acquire knowledge, have incomplete rule coverage, low accuracy, and high maintenance costs. Furthermore, artificial intelligence technology reduces accuracy and increases rule maintenance costs in data backup management.
A natural language-based data backup management method is adopted. A backup domain intent vector set is constructed through semantic vector encoding and a two-way comparison anchor calibration mechanism. Combined with an intent-tool mapping table and a tool invocation mechanism, intent classification and tool recommendation are realized. Anchors are dynamically updated to adapt to user expression habits, and tool selection is optimized through a tool predecessor-successor time-series co-occurrence graph.
It improved the accuracy and efficiency of data backup management, reduced rule maintenance costs, decreased token consumption for tool selection, and enhanced model adaptability and operational accuracy.
Smart Images

Figure CN122309236A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of data backup technology, and in particular to a data backup management method and apparatus based on natural language. Background Technology
[0002] Traditional data backup management systems suffer from high operational complexity, difficulty in troubleshooting, inconvenient knowledge acquisition, and cumbersome batch operations during actual operation and maintenance. To address these issues, existing technologies attempt to incorporate artificial intelligence to improve user experience. The main solutions include: command parsing based on keyword matching, direct application of general large language models, hierarchical intent recognition models based on tree structures and keyword secondary verification methods, and methods based on static anchor point queries. Among these issues, the keyword coverage is limited, making it unable to handle synonyms and colloquial expressions; rule maintenance costs are high, requiring manual rule addition for each new expression; the general model is unfamiliar with the API interfaces and data structures of specific backup systems, failing to generate executable operations, and model selection accuracy decreases when the number of tools is large, with high token consumption; in the hierarchical intent recognition model, each intent node is a static label, failing to capture the semantic diversity of user expressions, making it difficult to match static labels when users use synonyms, colloquial expressions, or multilingual input, requiring keyword rules to be added one by one, resulting in high rule maintenance costs and incomplete coverage; static anchor point queries are generated once through template derivation before system deployment, failing to adapt to the evolution of user expression habits in actual use, and not considering the frequency differences of different intent categories, leading to a decrease in intent recognition accuracy.
[0003] Therefore, a data backup management method and device based on natural language was developed to solve the above problems. Summary of the Invention
[0004] This invention proposes a data backup management method and apparatus based on natural language to solve the problems of incomplete rule coverage, low accuracy, and high maintenance costs in the prior art.
[0005] The present invention achieves the above objectives through the following technical solutions: This invention discloses a data backup management method based on natural language, comprising: Obtain the user's natural language input, which is the user's request for the data backup system; Semantic vectors are obtained by semantically encoding the natural language input. Based on nearest neighbor retrieval, the vector distance between the semantic vector and the preset backup domain intent vector set is calculated to determine the intent category of the user request. The backup domain intent vector set includes knowledge consultation intent vectors, operation intent vectors, and general dialogue intent vectors. If the intent category is knowledge consultation, then retrieve relevant document fragments for the user's intent category from the backup knowledge base, and input the retrieval results into the large language model to generate a response; If the intent category is general dialogue, the user's natural language input is directly input into the large language model to generate a response; If the intent category is operation execution, a subset of tools related to the current intent is selected from the full toolset based on the preset intent-tool mapping table. The full toolset includes query tools and operation tools. The user's natural language input and the definition information of the tool subset are input into the large language model to output the tool call request. If the tool call request is a query tool call request, it is executed directly. If it is an operation tool call request, the user is asked for confirmation, and the tool is executed after the user confirms.
[0006] Furthermore, semantic vector encoding is performed on the natural language input to obtain semantic vectors, including: using a pre-trained multilingual sentence vector model to map the user input to a high-dimensional semantic vector space, generating a dense vector representation of fixed dimensions.
[0007] Furthermore, the backup domain intent vector set is constructed and dynamically updated using a bidirectional contrast anchor calibration mechanism, the mechanism including: For each intent category, obtain the sample text set corresponding to the intent category, encode the sample text into vectors through a pre-trained sentence vector model, then calculate the centroid of all vectors, use the centroid as the initial anchor point, and normalize all anchor points. Once the intent category of the user's request is correctly classified and the user confirms the execution, the semantic vector corresponding to the user's request is used as a positive sample, and the anchor point is moved in the direction of the positive sample to achieve anchor point update. If the intent category of a user's request is misclassified, the semantic vector corresponding to the user's request is used as a negative sample, and the anchor point is moved away from the negative sample to update the anchor point. After each anchor point update, the anchor point is normalized.
[0008] Furthermore, after each anchor point update, the distance between adjacent intended anchor points is checked. If the distance between any two anchor points is less than the preset safety margin, the two anchor points are moved outward along the connecting line direction.
[0009] The backup domain intent vector set employs a bidirectional comparison anchor point calibration mechanism, specifically including: Step 1, Initialization Phase: For each intent category i, collect multiple representative sample texts of that category, encode them into vectors using a pre-trained sentence vector model, and calculate the centroid (i.e., the arithmetic mean) as the initial anchor point Ci(0). The initialization formula is: Ci(0) = (1 / ni)·Σj = 1..nivij; Where ni is the initial number of samples for the i-th intent category, vij is the semantic vector of the j-th sample for the i-th intent category, Σ represents the summation operation, and ni is the upper bound of the summation symbol Σ, from j=1 to j=ni; after initialization, the anchor points are normalized (i.e., the vectors are scaled to unit length) so that all anchor points lie on the unit hypersphere: Ci(0) = Ci(0) / ||Ci(0)||; Where ||·|| represents the Euclidean norm of the vector (i.e., the length of the vector); Step 2, Positive Attraction Update: After the user input is correctly classified as intent i and the user confirms execution, the input vector vpos is used as a positive sample (i.e., a correctly classified example). The anchor point is moved towards this sample, and the update formula is: Ci(t+1)=Ci(t)+α·freq_factor(i)·(vposCi(t))·1[sim<τupper]; Where Ci(t) is the vector value of the i-th intention anchor point at the t-th update; Ci(t+1) is the updated anchor point vector value; α is the basic learning rate, which controls the magnitude of each update and ranges from 0.01 to 0.1. vpos is the semantic vector (positive sample) input by the user; vposCi(t) is the direction vector from the anchor point to the positive sample; 1[sim<τupper] is the indicator function (i.e., the conditional judgment function), which takes a value of 1 when the similarity sim is less than the upper bound threshold τupper (default value is 0.95), otherwise it takes a value of 0, to prevent the anchor point from moving excessively to samples that are already sufficiently similar; freq_factor(i) is the intent frequency factor (i.e., the learning step size adjustment factor based on the call frequency), and the calculation formula is: freq_factor(i)=log(Ntotal / Ni+1) / log(Ntotal+1); Where Ntotal is the total number of calls to all intents, Ni is the historical number of calls to the i-th intent, and log represents the natural logarithm function; The design principle of the intent frequency factor is as follows: when the number of intent calls Ni is small (low-frequency intent), Ntotal / Ni is large, freq_factor is close to 1, and the learning step size is large; when Ni is large (high-frequency intent), freq_factor is close to 0, and the learning step size is small. This design reflects the operational characteristics of the backup domain: query intents are triggered frequently but their expression is relatively stable, so there is no need to adjust the anchor point significantly; creation / deletion intents are triggered infrequently but their expression changes frequently, so a larger learning step size is needed to quickly adapt to the new expression. Step 3, Reverse Rejection Update: When the user input is misclassified as intent i but the user actually expects intent j, the input vector vneg is treated as a negative sample (i.e., a misclassified example), and the anchor point is moved away from that sample. The update formula is: Ci(t+1)=Ci(t)β·(vnegCi(t)) / ||vnegCi(t)||²; Where: β is the rejection coefficient, which controls the rejection strength of negative samples, and its value ranges from 0.005 to 0.05; vneg is the user input vector (negative sample) that is misclassified; (vnegCi(t)) is the direction vector from the anchor point to the negative sample; ||vnegCi(t)||² is the square of the distance between the anchor point and the negative sample; The design principle of dividing by the square of the distance is as follows: negative samples that are closer to the anchor point indicate problems with the classification boundary and require a strong repulsive force to push the anchor point away; negative samples that are farther away have less impact on the anchor point position and do not require significant adjustment. Step 4, Anchor Point Normalization Constraint: Perform normalization after each update to ensure that the anchor point vector remains on the unit hypersphere. Ci(t+1)=Ci(t+1) / ||Ci(t+1)||; This constraint ensures that all anchor points have the same vector length, making the vector distance calculation results consistent and comparable; Step 5, Boundary Maintenance Constraints: After each update, check the distance between adjacent intention anchor points. If the distance between any two anchor points Ci and Cj is less than the safety margin γmin (default value 0.3), then move the two anchor points outward along the connecting line direction. Ci = Ciλ·(CjCi) / ||CjCi||; Cj = Cj + λ·(CjCi) / ||CjCi||; Where λ is the boundary adjustment step size (default value 0.1), and (CjCi) / ||CjCi|| is the unit direction vector from Ci to Cj; this constraint ensures that there is sufficient discriminative power between intents and avoids the classification boundary from being blurred due to excessive clustering of anchor points; The following technical effects are achieved through the bidirectional comparison anchor calibration mechanism: (1) The anchor vector can continuously learn new expression patterns based on user feedback without retraining the entire model; (2) Bidirectional adjustment of positive and negative samples improves the clarity of the classification boundary. Positive samples bring the anchor closer to make it more representative, while negative samples push the anchor away to avoid confusion in the region; (3) The intent frequency factor enables low-frequency intents to obtain a larger learning step size, solving the cold start problem of insufficient long-tail intent samples; (4) Boundary maintenance constraints avoid ambiguity caused by intent overlap and maintain the stability of classification accuracy.
[0010] Furthermore, a subset of tools relevant to the current intent is selected from the full toolset, including: Construct a predecessor-successor time series co-occurrence graph for tools, where the nodes of the predecessor-successor time series co-occurrence graph are tool nodes; The first set of candidate tools is extracted based on the intent-tool association matrix, and the second set of candidate tools is extracted based on the predecessor-successor temporal co-occurrence graph. If the user's natural language input conversation context contains the previous tool call, then its successor tool is extracted; otherwise, the target tool reverse predecessor query mechanism is activated. The high-frequency predecessor tools of the target tool are queried in the tool predecessor-successor time sequence co-occurrence graph. The first tool candidate set and the second tool candidate set are merged to obtain the merged candidate set, and the comprehensive score of each tool node in the merged candidate set is calculated. Specifically, the tool predecessor-successor co-occurrence time sequence graph GT=(VT,ET) is a directed weighted graph used to model the calling order relationship between tools, and is defined as follows: VT is the set of tool nodes, where each node represents a tool in the system. If there are M tools in the system, then |VT| = M. ET is the set of directed edges. If tool j was called before tool k, then there exists a directed edge (j,k) from j to k. Each edge (j,k) has a weight w(j,k), which represents the strength of the association between tool j and tool k. Incremental update process of edge weights: After the system records a tool call sequence, the edge weights are updated as follows: For any two tools j and k in the sequence, if j is called before k, then the edge weights are updated: w(j,k)=w(j,k)+e^(δ·gap(j,k))·ψ(typej,typek); Where: e is the natural constant (approximately equal to 2.718), δ is the temporal distance decay factor, with a default value of 0.3, which controls the degree of influence of the call interval on the weight, gap(j,k) is the difference in the number of call steps between tool j and tool k, i.e., the number of calls in the middle interval plus 1; e^(δ·gap(j,k)) is the exponential decay function, the larger gap(j,k) is, the smaller the weight contribution, reflecting the design concept of "the tools that are called in close proximity are more related", and ψ(typej,typek) is the backup operation type matching factor, which assigns different weights according to the combination of tool types; The rules for determining the value of the backup operation type matching factor ψ are as follows: ψ(typej,typek)=1.5, when typej=QUERY (query class) and typek=CREATE (creation class); ψ(typej,typek)=1.3, when typej=QUERY (query class) and typek=EXECUTE (execution class); ψ(typej,typek)=1.2, when typej=QUERY (query class) and typek=DELETE (delete class); ψ(typej,typek)=1.0, other type combinations.
[0011] This design reflects typical operational patterns in the backup field: before creating a backup strategy, it is usually necessary to query available client nodes and storage nodes; before executing a backup task, it is usually necessary to query the backup strategy; before deleting, it is usually necessary to query the detailed information of the target object; the system gives higher weight to these common "query first, then operate" patterns, making tool recommendations more in line with actual operating habits; Periodic normalization of edge weights: After every N accumulated session calls (default N=100), the edge weights are normalized. w_norm(j,k)=w(j,k) / Σsw(j,s); This involves normalizing the weights of all outgoing edges of each tool node into a probability distribution, making the weight values comparable.
[0012] Extracting the first tool candidate set based on the intent-tool association matrix: Based on the intent classification result i, extract the first tool candidate set S1 from the pre-constructed intent-tool association matrix W[i][j]: S1={j|W[i][j]>τ1}; Wherein, the intent-tool relevance matrix W[i][j] represents the relevance score between intent i and tool j, τ1 is the relevance threshold (default 0.3), and S1 contains all tools whose relevance to the current intent exceeds the threshold; Extracting the second tool candidate set based on the time-series co-occurrence graph: If there is a previous tool call in the session context (denoted as tool_prev), then the successor tool candidate set of that tool is extracted from the time-series co-occurrence graph, which is the second tool candidate set. S2={k|w_norm(prev,k)>τ2}; Where w_norm(prev,k) is the normalized edge weight from tool prev to tool k, τ2 is the co-occurrence threshold (default 0.1), and S2 contains tools that have historically been frequently called after tool_prev. If there is no previous tool call in the session, the target tool reverse predecessor query mechanism is enabled: the target tool is determined based on the intent classification result (e.g., the target tool for the CREATE_STRATEGY intent is create_backup_strategy), and the high-frequency predecessor tools of the target tool are queried from the time-series co-occurrence graph. S2={j|w_norm(j,target_tool)>τ2}; That is, query tools that have been frequently called before the target tool in history. These tools are usually prerequisite steps for the target operation. Merge candidate sets and calculate composite score: S = S1∪S2; For each tool j in the candidate set S, calculate the comprehensive score: score(j)=W[i][j]+λ·w_norm(prev,j); Where λ is the temporal co-occurrence weighting factor (default 0.5), used to balance the contributions of intent relevance and temporal co-occurrence; if j is not in S1, then W[i][j]=0, and if j is not in S2 or has not been called in the previous call, then w_norm(prev,j)=0.
[0013] The truncation number is dynamically calculated based on the confidence level of intent classification and the dispersion of intent tools. When the confidence level is higher than the preset threshold, the truncation number is small and aggressive pruning is adopted. When the confidence level is lower than the preset threshold, the truncation number is large and conservative pruning is adopted. Based on the merged candidate set sorted in descending order of comprehensive score, tool nodes with lower comprehensive scores are pruned according to the truncation number to obtain a tool subset.
[0014] Two-factor adaptive pruning based on confidence and dispersion. The aggressiveness of tool pruning is dynamically adjusted based on the confidence (conf) of intent classification and the historical tool dispersion (disp) of intent category. The formula for calculating the dynamic cutoff number k is: k=kbase·(1+η·disp)·sigmoid_inv(conf); Where sigmoid_inv(conf) = 1 / (1+e^(γ(confμ))); The meanings and value ranges of each parameter are as follows: kbase is the base truncation number, representing the number of tools retained when both confidence and dispersion are at a moderate level, with a default value of 3; disp is the tool dispersion of the intent, ranging from 0 to 1; high dispersion indicates that the intent may trigger multiple different tools, requiring conservative pruning; η is the dispersion adjustment coefficient, controlling the degree of influence of dispersion on the truncation number, with a default value of 2.0; conf is the confidence of intent classification, i.e., the similarity between the user input vector and the nearest anchor point, ranging from 0 to 1; γ is the confidence sensitivity coefficient, controlling the steepness of the sigmoid function, with a default value of 10; μ is the confidence center point, with sigmoid_inv taking a value of 0.5 when conf=μ, and a default value of 0.7; sigmoid_inv is the inverse sigmoid function, taking a value less than 0.5 when conf>μ, and a value greater than 0.5 when conf<μ; The design principle of this formula is: When the confidence level conf is high (e.g., >0.8), sigmoid_inv(conf) is small, the truncation number k is small, and aggressive pruning is adopted to retain only a small number of the most relevant tools; When the confidence level conf is low (e.g., <0.6), sigmoid_inv(conf) is large, the truncation number k is large, and conservative pruning is adopted to retain more candidate tools to avoid erroneous pruning; When the dispersion disp is high, the (1+η·disp) factor increases the cutoff number, retaining more tools to cover the multiple tools that the intention may trigger; The final tool subset is dynamically truncated: the dynamic truncation number k is calculated based on the confidence level (conf) and dispersion (disp). Then, the tools in the candidate set S are sorted in descending order of their comprehensive score (score(j)). The top k tools with the highest scores (i.e., topk selection, where topk means selecting the top k elements after sorting by score) are retained as the final output tool subset and passed to the large language model for tool selection and parameter generation. To ensure operational integrity, the system sets a minimum retention constraint kmin for different intent types: query intent kmin=1, creation intent kmin=3 (auxiliary query tools must be retained), deletion intent kmin=2, execution intent kmin=2. The final truncation number is max(k, kmin).
[0015] The following technical effects are achieved by using a confidence-adaptive pruning mechanism based on the tool call time-series co-occurrence graph: (1) The tools called in close proximity achieve higher correlation through exponential decay, which is consistent with the multi-step dependency characteristics of backup operations; (2) The backup operation type matching factor ψ gives extra weight to the common pattern of "query first, then operate", making the tool recommendation more in line with the domain's operating habits; (3) The time-series co-occurrence graph automatically adds the dependent tools of multi-step operations to the candidate set. For example, when a user says "create backup strategy", although the intention is to create, the system automatically adds the query node tool, which is often used as a preliminary step, to the candidate; (4) The confidence and dispersion factors adaptively adjust the pruning aggressiveness. When the confidence is high, aggressive pruning improves efficiency, and when the confidence is low, conservative pruning avoids mispruning; (5) The measured data shows that the number of tool definition tokens input to the model is reduced by more than 70%, and the tool selection accuracy is improved by more than 20%.
[0016] Furthermore, the intent-tool mapping table includes tools for querying class intents, creating class intents, deleting class intents, executing class intents, and controlling class intents, where: The tools corresponding to query intents include tools for listing backup policies, obtaining policy details, listing backup tasks, obtaining task details, obtaining task logs, listing client nodes, listing storage nodes, obtaining node status, listing scheduling plans, obtaining scheduling details, listing client configurations, obtaining database configurations, obtaining system status, and obtaining license information. Tools for creating class intent mappings include tools for creating backup strategies, tools for creating class-assisted queries, and tools for creating scheduling plans. Tools corresponding to deletion intents include tools for deleting backup strategies, deleting task records, deleting auxiliary query tools, and deleting scheduling plans; The tools corresponding to execution-related intents include execution backup tools, execution-related auxiliary query tools, execution recovery tools, and batch execution backup tools; Tools corresponding to control intents include tools for canceling tasks, stopping tasks, restarting tasks, and auxiliary query tools for control.
[0017] Furthermore, the knowledge consultation intent vectors include installation and deployment consultation, troubleshooting consultation, concept explanation consultation, and best practice consultation; the general dialogue intent vectors include daily dialogue; the operation intent vectors include query intent vectors, execution intent vectors, creation intent vectors, deletion intent vectors, and control intent vectors; the query intent vectors include querying backup policies, querying backup tasks, querying client nodes, querying storage nodes, querying scheduling plans, querying database configuration, and querying system status; the execution intent vectors include executing backup tasks, executing recovery tasks, and batch executing backups; the creation intent vectors include creating backup policies and creating scheduling plans; the deletion intent vectors include deleting backup policies, deleting task records, and deleting scheduling plans; and the control intent vectors include canceling tasks, stopping tasks, and restarting tasks.
[0018] Furthermore, the user's natural language input and the definition information of the tool subset are input into the large language model to output a tool invocation request. If the tool invocation request is a query-type tool invocation request, it is executed directly; if it is an operation-type tool invocation request, the user is asked for confirmation, and execution is performed after the user's confirmation, including: If the tool subset corresponds to the query intent, the large language model outputs the query parameters, and the backup system API is directly called based on the query parameters to return the formatted results. If the tool subset corresponds to the creation class intent tool, call the creation class auxiliary query tool to obtain the context, input the context into the large language model, output the creation parameters, display a confirmation interface for user confirmation, and execute the confirmation operation after user confirmation; the obtained context includes system resource information required for the creation operation, such as available client nodes, available storage nodes, existing scheduling plans, and current client configuration.
[0019] If the tool subset corresponds to the deletion intent, the deletion auxiliary query tool is invoked, deletion parameters are generated through the large language model, cascading impact transmission calculations are performed based on the deletion parameters, an impact map is displayed, the user makes hierarchical confirmation, and the deletion operation is executed after the user confirms. If the tool subset is the tool corresponding to the execution class intent, call the execution class auxiliary query tool to obtain the target list, input the target list into the large language model, output the execution range, perform cascading impact transmission calculation based on the execution range, the user performs hierarchical confirmation, and the operation is executed after the user confirms; If the tool subset corresponds to a control-class intent tool, the control-class auxiliary query tool is invoked to query the running tasks. The running tasks are then input into the large language model to match the target task, and the task status and impact are displayed. The user then confirms and executes the control operation.
[0020] Furthermore, cascading effect transmission calculations are performed, including: Construct a data object-tool influence graph. The nodes of the data object-tool influence graph include tool nodes and data object nodes. The edges of the data object-tool influence graph include operation edges and dependency edges. Operation edges are the edges from tool nodes to data object nodes, and dependency edges are the edges from data object nodes to data object nodes. When a tool call request is received, a breadth-first traversal is performed from the tool node corresponding to the tool call request along the influence edges based on the graph traversal algorithm to calculate the cascade influence score of each data object. The security level is determined based on the magnitude of the impact on the score, and the corresponding confirmation strategy is implemented.
[0021] The data objects refer to the business entities in the backup system, specifically including: backup policies, which are configuration objects that define backup rules; backup tasks, which are instances of backup tasks that are actually executed; client nodes, which are protected data source nodes; storage nodes, which are storage targets for backup data; scheduling plans, which are scheduling configurations that define backup execution times; and running tasks, which are backup tasks that are currently in the execution state.
[0022] The present invention also provides an apparatus for the aforementioned natural language-based data backup management method, comprising: The acquisition module is used to acquire the user's natural language input, which is the user's request for the data backup system; A vector encoding module is used to perform semantic vector encoding on natural language input to obtain a semantic vector. The intent classification module is used to calculate the vector distance between the semantic vector and the preset backup domain intent vector set based on nearest neighbor retrieval, and determine the intent category of the user request. The backup domain intent vector set includes knowledge consultation intent vectors, operation intent vectors, and general dialogue intent vectors. The first generation module is used to retrieve relevant document fragments of the user's intent category from the backup knowledge base if the intent category is knowledge consultation, and input the retrieval results into the large language model to generate an answer. The second generation module is used to directly input the user's natural language input into the large language model to generate a response if the intent category is a general dialogue category. The execution module is used to, if the intent category is operation execution, filter out a subset of tools related to the current intent from the full toolset according to a preset intent-tool mapping table. The full toolset includes query tools and operation tools. The module inputs the user's natural language input and the definition information of the tool subset into the large language model and outputs a tool call request. If the tool call request is a query tool call request, it is executed directly. If it is an operation tool call request, it requests confirmation from the user and executes the request after the user confirms.
[0023] The beneficial effects of this invention are as follows: The natural language-based data backup management method and device proposed in this invention solves the problems of incomplete rule coverage, low accuracy, and high maintenance costs in the prior art. Attached Figure Description
[0024] Figure 1 This is an overall architecture diagram of an intelligent data backup management system according to an embodiment of this application. Detailed Implementation
[0025] To make the objectives, technical solutions, and advantages of the embodiments of the present invention clearer, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some, not all, of the embodiments of the present invention. The components of the embodiments of the present invention described and shown in the accompanying drawings can generally be arranged and designed in various different configurations.
[0026] Therefore, the following detailed description of the embodiments of the invention provided in the accompanying drawings is not intended to limit the scope of the claimed invention, but merely to illustrate selected embodiments of the invention. All other embodiments obtained by those skilled in the art based on the embodiments of the invention without inventive effort are within the scope of protection of the invention.
[0027] The specific embodiments of the present invention will now be described in detail with reference to the accompanying drawings.
[0028] like Figure 1 As shown, the present invention also provides an intelligent data backup management system comprising the following hierarchical structure: User interaction layer: Provides a web interface and API interface to receive natural language input from users and display system response results. Supports streaming output, allowing users to see the system's processing and response content in real time.
[0029] The Natural Language Understanding Layer comprises a vector encoding module, a bidirectional contrast anchor calibration module, and an intent classification module. The vector encoding module employs a pre-trained multilingual sentence vector model. This model can be a pre-trained model based on the Transformer architecture, such as paraphrase-multilingual-MiniLM-L12-v2, BAAI / bgem3, or text2vec-base-chinese-paraphrase. In this embodiment, paraphrase-multilingual-MiniLM-L12-v2 is preferred. This model supports over 50 languages, outputs 384-dimensional dense vectors, and features fast inference speed and low resource consumption. The intent classification module pre-constructs 18 backup domain intent vector sets, each containing 1030 representative samples. The bidirectional contrast anchor calibration module implements a bidirectional contrast anchor calibration mechanism to construct and dynamically update the backup domain intent vector sets.
[0030] The intelligent decision-making layer comprises a routing decision module, a tool filtering module, and a parameter completion module. The routing decision module determines whether a request should follow a knowledge retrieval path or a tool invocation path based on the intent category. The tool filtering module filters relevant tools based on the intent-tool mapping table. The parameter completion module is responsible for extracting and completing tool parameters from user input and context.
[0031] The secure execution layer includes a tool classification module and a confirmation execution module. The tool classification module categorizes tools into query tools and operation tools. The confirmation execution module generates confirmation requests for operation tools, which can only be executed after user confirmation.
[0032] Backup System Interface Layer: Encapsulates the API interfaces of the underlying backup system, providing a unified way to call tools. It includes over 20 tools covering functions such as policy management, task management, node management, storage management, and system monitoring.
[0033] The session management module is used to save session context and supports multi-turn conversations and history queries.
[0034] Knowledge base layer: Contains technical documents for the backup system, uses a vector database to store vector representations of document fragments, and supports semantic retrieval.
[0035] Query intents include querying backup policies (QUERY_STRATEGY), querying backup tasks (QUERY_TASK), querying client nodes (QUERY_NODE), querying storage nodes (QUERY_STORAGE), querying scheduling plans (QUERY_SCHEDULE), querying database configurations (QUERY_CONFIG), and querying system status (QUERY_SYSTEM). Execution intents include executing backup tasks (EXECUTE_BACKUP), executing recovery tasks (EXECUTE_RESTORE), and performing batch backups (EXECUTE_BATCH). Create class intents: including creating a backup policy (CREATE_STRATEGY) and creating a scheduling plan (CREATE_SCHEDULE); Deletion intents include: Deleting backup policies (DELETE_STRATEGY), deleting task records (DELETE_TASK), and deleting scheduling plans (DELETE_SCHEDULE). Control intents include CANCEL_JOB, STOP_JOB, and RESTART_JOB. Knowledge-based consultation intents include installation and deployment consultation, troubleshooting consultation, concept explanation consultation, and best practice consultation (KNOWLEDGE_QA). General conversational intents: including greetings, thanks, farewells and other everyday conversations.
[0036] The tools corresponding to query intents are: list_backup_strategies (list backup strategies), get_strategy_detail (get strategy details); list_backup_tasks (list backup tasks), get_task_detail (get task details), get_task_logs (get task logs); list_client_nodes (list client nodes), list_storage_nodes (list storage nodes), get_node_status (get node status); list_schedules (list scheduling plans), get_schedule_detail (get scheduling details); list_client_configs (list client configurations), get_database_config (get database configurations); get_system_status (get system status), get_license_info (get license information).
[0037] Tools corresponding to the created intents: create_backup_strategy (create backup strategy), list_client_nodes (auxiliary query), list_storage_nodes (auxiliary query), list_schedules (auxiliary query), list_client_configs (auxiliary query); create_schedule (create scheduling plan).
[0038] Tools corresponding to deletion intents: delete_backup_strategy (delete backup strategy), list_backup_strategies (auxiliary query); delete_task_record (delete task record), list_backup_tasks (auxiliary query); delete_schedule (delete scheduling plan), list_schedules (auxiliary query).
[0039] The tools corresponding to execution intents are: execute_backup (execute backup), list_backup_strategies (assistive queries); execute_restore (execute restore), list_backup_tasks (assistive queries); batch_execute_backup (batch execute backup), list_backup_strategies (assistive queries).
[0040] Control intents correspond to the following tools: cancel_job (cancel task), stop_job (stop task), restart_job (restart task), and list_running_jobs (auxiliary query).
[0041] Example 1: This example demonstrates the complete processing flow of user query backup strategy and details the online update process of the bidirectional comparison anchor calibration mechanism.
[0042] Step 1: Vector encoding.
[0043] The system uses a sentence vector model to encode the user input, resulting in a 384-dimensional vector. The user input is: "Help me see what backup strategies are available for Oracle databases". The sentence vector model outputs a 384-dimensional vector [0.023, 0.156, 0.089, ..., 0.034] as the input vector.
[0044] Step 2: Intent Classification.
[0045] The system calculates the cosine similarity between the input vector and each intent anchor point in the backup domain intent vector set: QUERY_STRATEGY anchor: The central aggregation point for backup strategy query intents; Similarity calculation results: QUERY_STRATEGY: 0.92 (highest), QUERY_TASK: 0.45, CREATE_STRATEGY: 0.38, KNOWLEDGE_QA: 0.31; The highest similarity was 0.92, exceeding the dynamic confidence threshold. According to the formula thresholdi = α·(11 / √ni), where the number of samples for the QUERY_STRATEGY intent is ni = 25, and the global adjustment coefficient α = 0.7, the following was calculated: threshold_QUERY_STRATEGY=0.56; Since 0.92 > 0.56, the intent classification result is determined to be QUERY_STRATEGY, with a confidence level of conf=0.92.
[0046] Step 3: Routing decision.
[0047] QUERY_STRATEGY belongs to the operation execution class intent, which is routed to the tool invocation path.
[0048] Step 4: Tool filtering.
[0049] According to the intent-tool mapping table: QUERY_STRATEGY→[list_backup_strategies,get_strategy_detail]; The dynamic truncation number k is calculated using a confidence-adaptive pruning mechanism: The basic truncation number kbase=3; The tool dispersion of the QUERY_STRATEGY intent is disp=2 / 150=0.013 (two different tools were triggered in the past 150 calls). The dispersion adjustment coefficient η = 2.0; Confidence level conf=0.92, confidence center μ=0.7, sensitivity coefficient γ=0.10; k=1; Because of the high confidence level and low dispersion, the system adopts an aggressive pruning approach, retaining only the tool list_backup_strategies with the highest score.
[0050] Step 5: Large language model invocation.
[0051] By passing the user-inputted and filtered tool definitions to the large language model, the number of tool definition tokens was reduced from approximately 2000 (all 20+ tools) to approximately 150 (1 tool), a reduction of 92.5%.
[0052] The large language model returns the tool invocation decision.
[0053] Step 6: Security Classification and Implementation.
[0054] Perform cascading effect propagation calculations on list_backup_strategies: This tool is a read-only query tool that does not modify any data objects. The influence domain is an empty set, R1=true, and the safety level is determined to be L0. It can be executed directly without confirmation.
[0055] Step 7: Return the results.
[0056] The system calls the backup system API to obtain a list of policies, formats it, and returns it to the user.
[0057] Step 8: Online anchor point update (two-way comparison anchor point calibration).
[0058] After the user confirms that the query results meet expectations, the system triggers a positive attraction update, using the current input vector as a positive sample to update the QUERY_STRATEGY anchor point: First, calculate the intent frequency factor freq_factor: total number of system calls Ntotal=1000; historical number of calls for QUERY_STRATEGY intent Ni=350 (query intents are high-frequency); since QUERY_STRATEGY is a high-frequency intent, freq_factor is small (0.195), the learning step size is small, and the anchor point of high-frequency intents is not over-adjusted.
[0059] Then perform a positive attraction update: Base learning rate α = 0.05; current similarity sim = 0.92 < upper bound threshold τupper = 0.95, indicator function 1[sim < τupper] = 1; update formula: C_QUERY_STRATEGY(t+1)=C_QUERY_STRATEGY(t)+0.00975×(vposC_QUERY_STRATEGY(t)); The anchor point is slightly adjusted by about 1% in the direction of the positive sample to gradually adapt to the user's expression habits.
[0060] Finally, normalization constraints are applied: C_QUERY_STRATEGY(t+1)=C_QUERY_STRATEGY(t+1) / ||C_QUERY_STRATEGY(t+1)||; Ensure that the updated anchor point remains on the unit hypersphere.
[0061] Example 2: This example demonstrates the complete process of a user creating a backup strategy, focusing on the specific calculation process of the time series co-occurrence graph pruning mechanism and the cascading effect transmission mechanism.
[0062] User input: "Create a daily full backup strategy for a MySQL database".
[0063] Steps 1 and 2: Vector encoding and intent classification.
[0064] The intent classification result is CREATE_STRATEGY, with a confidence level of conf=0.94.
[0065] Step 3: Tool selection based on time series co-occurrence graph (detailed calculation process).
[0066] Step 3.1: Extract the first tool candidate set S1 from the intent-tool relevance matrix. Based on the pre-constructed intent-tool relevance matrix W[i][j], extract tools whose CREATE_STRATEGY intent relevance exceeds the threshold τ1=0.3: W[CREATE_STRATEGY][create_backup_strategy]=1.0 (keyword exact match); W[CREATE_STRATEGY][list_client_nodes]=0.0 (No keyword match); W[CREATE_STRATEGY][list_storage_nodes]=0.0 (No keyword match); W[CREATE_STRATEGY][list_schedules]=0.35 ("Schedules" part matches); W[CREATE_STRATEGY][list_client_configs]=0.0 (No keyword match); The first tool candidate set S1 = {create_backup_strategy, list_schedules}.
[0067] Step 3.2: Extract the predecessor candidate set S2 from the temporal co-occurrence graph.
[0068] If there is no previous tool call (first round of conversation) in the current session, it is impossible to extract subsequent tools based on the previous call.
[0069] The system detected the user intent as CREATE_STRATEGY and enabled the "target tool reverse predecessor lookup" mechanism: it queries the time-series co-occurrence graph for high-frequency predecessor tools of create_backup_strategy, i.e., tools that were historically frequently called before create_backup_strategy. w_norm(list_client_nodes→create_backup_strategy)=0.85 (meaning 85% of creation strategies call the query node); w_norm(list_storage_nodes→create_backup_strategy)=0.78 (meaning 78% of creation strategies call the query store); w_norm(list_client_configs→create_backup_strategy)=0.72 (meaning 72% of creation strategies call the query configuration).
[0070] Add these high-frequency precursor tools to the candidate set: S2={list_client_nodes,list_storage_nodes,list_client_configs}; Step 3.3: Merge the candidate sets and calculate the overall score.
[0071] S=S1∪S2={create_backup_strategy,list_schedules,list_client_nodes,list_storage_nodes,list_client_configs}; Overall score calculation (λ=0.5): For the intent-tool relevance matrix W[i][j] and the temporal co-occurrence graph weights, calculate the overall score score(j)=W[i][j]+λ·w_reverse(j→target_tool): score(create_backup_strategy)=1.0 (the target tool itself has no predecessor relationship); score(list_client_nodes)=0.425 (as a high-frequency predecessor of create_backup_strategy); score(list_storage_nodes)=0.39; score(list_client_configs)=0.36; score(list_schedules)=0.35 (only intent-related scores, no temporal co-occurrence scores).
[0072] Step 3.4: Confidence-based adaptive pruning.
[0073] Baseline truncation count kmay=3; tool dispersion of the CREATE_STRATEGY intent disp=5 / 80=0.0625 (5 different tools were triggered in 80 historical calls); dispersion adjustment coefficient η=2.0; Confidence level conf=0.94; sigmoid_inv(0.94)=0.083; k=1; Since CREATE_STRATEGY is a creation class intent, the system applies a minimum retention constraint kmin=3 (the creation class intent must retain at least 3 tools to ensure that auxiliary query tools are available), so the final k=max(1,3)=3.
[0074] After sorting by score, the top 3 tools are retained: final tool subset = {create_backup_strategy(1.0), list_client_nodes(0.425), list_storage_nodes(0.39)}; the number of tool definition tokens is reduced from about 2000 (all 20+ tools) to about 450 (3 tools), a reduction of 77.5%.
[0075] Step 4: Intelligent parameter completion.
[0076] The system extracts parameters from user input: Strategy type: mysql (extracted from "MySQL database", regular expression matches database type keyword); Backup type: full (extracted from "full backup", mapped to the enumeration value full); Scheduling mode: daily (extracted from "daily", mapped to scheduling type daily).
[0077] Steps 5, 6, and 7: Large language model invocation, query execution, and secondary invocation.
[0078] The large language model first calls list_client_nodes and list_storage_nodes to obtain a list of available resources, and then generates the complete parameters for create_backup_strategy.
[0079] Step 8: Calculation of cascaded effects (detailed process).
[0080] Step 8a: Starting from the create_backup_strategy tool node, cascade the effects along the data object dependency graph.
[0081] Level 0 (Direct tool operation): create_backup_strateg; cascading impact(new_strategy, 0) = 1.5.
[0082] Level 1 (first-level dependency): The creation operation does not affect existing objects, only adds new resources, so there are no affected objects in Level 1.
[0083] Step 8b: Calculate the four-dimensional risk indicators: R1 (Empty influence domain): false (A new object has been created); R2 (Depth of influence): depth=0 (Only Level 0 is affected); R3 (Breadth of influence): breadth=1 (Only one newly created object is affected); R4 (Critical object hit): false (The newly created object has no label yet).
[0084] Step 8c: Security level decision: According to the decision rule: depth≤1 and breadth≤3 and R4=false → Level L1 (shallow limited influence).
[0085] Step 9: Security Confirmation.
[0086] With security level L1, the system sends a brief confirmation request to the user, displaying policy details: "A full MySQL backup strategy will be created and executed daily, storing data in Storage01. Confirm?"
[0087] Step 10: Execute after user confirmation.
[0088] After the user clicks "Confirm Creation", the system performs the creation operation and returns the result.
[0089] Step 11: Update the timing co-occurrence diagram.
[0090] The tool call sequence for this session is: [list_client_nodes,list_storage_nodes,create_backup_strategy] Edge weights of the system update time-series co-occurrence graph: For (list_client_nodes→create_backup_strategy), gap=2: w_new=w_old+0.824; For (list_storage_nodes→create_backup_strategy), gap=1: w_new=w_old+1.11; for (list_client_nodes→list_storage_nodes), gap=1: w_new=w_old+0.741; The backup operation type matching factor ψ assigns a 1.5x weight to the "query first, then create" pattern, reinforcing this common operational rule.
[0091] Example 3: This example demonstrates the processing flow for knowledge consultation requests.
[0092] User input: "How do I install the MySQL backup agent?"
[0093] Steps 1 and 2: Vector encoding and intent classification.
[0094] The intent classification result is KNOWLEDGE_QA, with a confidence level of 0.89.
[0095] Step 3: Routing decision.
[0096] KNOWLEDGE_QA is a knowledge consultation intent, which is routed to the knowledge retrieval path and does not enter the tool invocation process.
[0097] Step 4: Knowledge Retrieval.
[0098] The system uses vector retrieval to find relevant document fragments from the backup knowledge base.
[0099] Step 5: Generate an answer based on the search results.
[0100] The retrieved document fragments are used as context to generate answers using a large language model.
[0101] This example demonstrates that knowledge consultation requests do not mistakenly trigger tool calls, but instead provide accurate document information through the knowledge retrieval path.
[0102] Example 4: This example demonstrates the processing flow of batch operations, focusing on the security level determination process of the cascading effect transmission mechanism in batch operation scenarios.
[0103] User input: "Execute all MySQL-related backup strategies".
[0104] Steps 1 and 2: Vector encoding and intent classification.
[0105] The intent classification result is EXECUTE_BATCH (batch backup execution), with a confidence level of conf=0.88.
[0106] Step 3: Tool filtering.
[0107] Trimming based on the intent-tool mapping table and the temporal co-occurrence diagram: Two relevant tools were selected from batch_execute_backup and list_backup_strategies, with list_backup_strategies used to obtain a list of strategies that meet the criteria.
[0108] Step 4: Large language model invocation and scope determination.
[0109] The large language model first calls list_backup_strategies to retrieve all MySQL-related strategies: Return results: There are 5 strategies in total: strategy_001: Daily full update of MySQL production database (tagged: production environment); strategy_002: Daily incremental update of MySQL test database; strategy_003: Weekly full update of MySQL development database; strategy_004: Daily full update of MySQL log database; strategy_005: Monthly full update of MySQL archive database.
[0110] Step 5: Calculation of cascaded effects (detailed process).
[0111] Step 5a: Perform cascading effect propagation on the batch_execute_backup tool and calculate the scope of impact: Level 0 (Strategy involving direct tool manipulation): batch_execute_backup → strategy_001, strategy_002, strategy_003, strategy_004, strategy_005; initial impact score for each strategy: impact(strategy,0)=1.0×ω(STRATEGY)=1.0×1.5=1.5 Level 1 (client nodes and storage nodes associated with the strategy): Query the dependency relationships of each strategy from the data object dependency graph: strategy_001→client_node_prod (dependency weight 0.9), storage_node_01 (dependency weight 0.8); strategy_002→client_node_test (dependency weight 0.9), storage_node_02 (dependency weight 0.8); strategy_003→client_node_dev (dependency weight 0.9), storage_node_02 (dependency weight 0.8); strategy_004→client_node_log (dependency weight 0.9), storage_node_03 (dependency weight 0.8); strategy_005→client_node_archive (dependency weight 0.9), storage_node_03 (dependency weight 0.8); Level decay function: φ(1)=1 / (1+0.5×(11)²)=1.0.
[0112] Level 1 affects score calculation: impact(client_node_prod,1)=0.9×1.5×1.0×ω(CLIENT_NODE)=0.9×1.5×1.0×1.2=1.62; impact(storage_node_01,1)=0.8×1.5×1.0×1.0=1.2 (other nodes are calculated similarly).
[0113] Level 2 (Potentially Triggered Task): Performing a backup will create a new backup task.
[0114] Check for conflicts with running tasks: strategy_001 currently has a running task task_running_001 (label: production environment).
[0115] Hierarchical decay function: φ(2)=0.67; impact score impact(task_running_001,2)=1.74.
[0116] Step 5b: Calculate the four-dimensional risk index: R1 (Affected domain is empty): false (Backup of 5 strategies will be executed); R2 (Depth Influence): depth = max{level|∃obj,impact(obj,level)>0} = 2 (The impact was transmitted to running tasks at Level 2); R3 (breadth of impact): breadth = |{obj|Σlevelimpact(obj,level)>0.01}|; Affected object list: 5 strategies + 5 client nodes + 3 storage nodes + 1 running task = 14 breadths = 14; R4 (critical object hit): Check the tags in the affected domains, strategy_001 has the "production environment" tag → hit, task_running_001 has the "production environment" tag → hit. R4=true.
[0117] Step 5c: Security Level Decision According to the comprehensive decision-making rules: R4=true (critical objects are affected) → directly upgraded to L3 level; at the same time breadth=14>10 (the breadth of impact exceeds the threshold) → confirmed as L3 level; security level determination result: L3 level (critical objects are affected).
[0118] Step 6: Level 3 security verification process.
[0119] Step 6a: The system generates an influence map, visually displaying the cascading influence transmission results in a tree structure. Root node: batch_execute_backup (batch execution backup tool) Level 1 (Level 0, Direct Impact): 5 backup strategies; strategy_001 (daily full backup of MySQL production database), tagged: production environment, impact=1.5, marked as warning; strategy_002 (daily incremental backup of MySQL test database), impact=1.5; strategy_003 (weekly full backup of MySQL development database), impact=1.5; strategy_004 (daily full backup of MySQL log database), impact=1.5; strategy_005 (monthly full backup of MySQL archive database), impact=1.5; Level 2 (Level 1, Indirect Impact): 8 nodes; 5 client nodes: client_node_prod, client_node_test, client_node_dev, client_node_log, client_node_archive; 3 storage nodes: storage_node_01, storage_node_02, storage_node_03.
[0120] Level 3 (Level 2, Potential Conflicts): 1 running task; task_running_001, tag: production environment, impact=1.74, marked as warning; Risk description: May lead to resource contention or task conflicts.
[0121] Step 6b: The system prompts for administrator privilege verification and generates a recovery plan: "Batch operations were detected involving the production environment (strategy_001) and running tasks (task_running_001), with a security level of L3."
[0122] Recommended actions: Exclude the production environment policy and execute only the other four policies; or wait for task_running_001 to complete before executing; or enter administrator credentials to force the execution of all five policies.
[0123] If problems occur after forced execution, you can recover using the following methods: Cancel command: cancel_jobtask_id=xxx; Check for conflicts: get_task_logtask_id=task_running_001 Step 7: User selection and execution.
[0124] The user selects Option 1, "Exclude Production Environment Strategy." The system recalculates the impact scope: new breadth=9 (4 strategies + 4 nodes + 1 storage, no running task conflicts); R4=false (no critical objects hit); new security level: depth=1, breadth=9, R4=false → Level L2; the system is downgraded to Level L2 for confirmation. After the impact map is displayed, the user confirms again and performs batch backups of the four strategies.
[0125] Step 8: Execution results and audit logs.
[0126] The system recorded complete audit information: Original request: Execute all MySQL-related backup strategies (5); Initial security level: L3 (critical objects hit); User adjustment: Exclude production environment strategies; Final security level: L2; Actual execution: 4 strategies (strategy_002, 003, 004, 005); Execution result: All successful; Confirmation time: 2026-02-05 14:32:15.
[0127] Example 5: This example demonstrates the system's multi-turn dialogue capability. The system uses conversation context memory to achieve cross-turn task association and automatic parameter completion.
[0128] The present invention also provides an apparatus for the aforementioned natural language-based data backup management method, comprising: The acquisition module is used to acquire the user's natural language input, which is the user's request for the data backup system; A vector encoding module is used to perform semantic vector encoding on natural language input to obtain a semantic vector. The intent classification module is used to calculate the vector distance between the semantic vector and the preset backup domain intent vector set based on nearest neighbor retrieval, and determine the intent category of the user request. The backup domain intent vector set includes knowledge consultation intent vectors, operation intent vectors, and general dialogue intent vectors. The first generation module is used to retrieve relevant document fragments of the user's intent category from the backup knowledge base if the intent category is knowledge consultation, and input the retrieval results into the large language model to generate an answer. The second generation module is used to directly input the user's natural language input into the large language model to generate a response if the intent category is a general dialogue category. The execution module is used to, if the intent category is operation execution, filter out a subset of tools related to the current intent from the full toolset according to a preset intent-tool mapping table. The full toolset includes query tools and operation tools. The module inputs the user's natural language input and the definition information of the tool subset into the large language model and outputs a tool call request. If the tool call request is a query tool call request, it is executed directly. If it is an operation tool call request, it requests confirmation from the user and executes the request after the user confirms.
[0129] The advantages of this invention compared to the prior art are as follows: (1) Improve the speed and accuracy of intent recognition. The present invention adopts an intent classification method based on semantic vector space mapping. Compared with the hierarchical intent recognition scheme, it does not require multiple layers of model calls. Intent classification can be completed in a single vector distance calculation, with an average response time of less than 50ms. Compared with the keyword matching method, it can handle synonyms, colloquial expressions and multilingual input, and the intent recognition accuracy reaches more than 95%.
[0130] (2) Significantly reduce the cost of calling large language models. Through the dynamic tool pruning mechanism, the tool set is filtered before calling the large language model, reducing the number of tools passed to the model from more than 20 to 3-5, the number of tool definition tokens is reduced by more than 70%, and the model inference time is reduced by more than 40%. Compared with the full tool passing solution, it saves about 60% of the API cost per call.
[0131] (3) Improve the efficiency of operation and maintenance. Operation and maintenance personnel can directly complete operations such as backup policy creation, task execution, and fault query through natural language, without having to switch between multiple interfaces and fill out complex forms. Taking the creation of backup policy as an example, the operation time is reduced from 35 minutes to 30 seconds.
[0132] (4) Achieve fine-grained security control. Through a four-level automatic security classification mechanism (L0 read-only / L1 reversible / L2 irreversible / L3 high-risk in batches), the risk level is automatically determined based on the semantic features of the tool's metadata, eliminating the need for manual configuration of security rules one by one; it provides graded confirmation and audit tracking for high-risk operations, reducing the error rate by more than 95%.
[0133] (5) Intelligent triage of knowledge consultation and operation requests. The system predefines anchor point areas for knowledge consultation intents in the semantic vector space. By pre-judging the intent classification, it realizes dual-path triage of knowledge retrieval and tool invocation, avoiding the accidental triggering of tool execution by knowledge consultation requests (such as "MySQL backup agent installation steps").
[0134] (6) Reduce training costs for maintenance personnel. Maintenance personnel do not need to learn complex operation interfaces and command syntax. They can interact with the backup system through natural language, reducing training time from 2-3 days to 0.5 days.
[0135] (7) Supports incremental expansion of the intent system. When adding a new backup type or operation intent, only a few representative sample texts need to be provided. The system will automatically generate the corresponding anchor vector and update the correlation matrix without retraining the model or modifying the hierarchical tree structure.
[0136] The above description is only a preferred embodiment of the present invention. It should be noted that for those skilled in the art, several improvements and modifications can be made without departing from the technical principles of the present invention, and these improvements and modifications should also be considered within the scope of protection of the present invention.
Claims
1. A data backup management method based on natural language, characterized in that, include: Obtain the user's natural language input, which is the user's request for the data backup system; Semantic vectors are obtained by semantically encoding the natural language input. The vector distance between the semantic vector and the preset backup domain intent vector set is calculated based on nearest neighbor retrieval to determine the intent category of the user's request; If the intent category is knowledge consultation, then retrieve relevant document fragments for the user's intent category from the backup knowledge base, and input the retrieval results into the large language model to generate a response; If the intent category is general dialogue, the user's natural language input is directly input into the large language model to generate a response; If the intent category is an operation execution category, based on the preset intent-tool mapping table, a subset of tools related to the current intent is selected from the full toolset. The user's natural language input and the definition information of the tool subset are input into the large language model to output the tool call request. If the tool call request is a query tool call request, it is executed directly. If it is an operation tool call request, the user is asked for confirmation, and execution is performed after the user confirms.
2. The data backup management method based on natural language according to claim 1, characterized in that, Semantic vector encoding is performed on natural language input to obtain semantic vectors, including: using a pre-trained multilingual sentence vector model to map user input to a high-dimensional semantic vector space and generate a dense vector representation of fixed dimensions.
3. The data backup management method based on natural language according to claim 1, characterized in that, The backup domain intent vector set is constructed and dynamically updated using a bidirectional comparison anchor calibration mechanism, which includes: For each intent category, obtain the sample text set corresponding to the intent category, encode the sample text into vectors through a pre-trained sentence vector model, then calculate the centroid of all vectors, use the centroid as the initial anchor point, and normalize all anchor points. Once the intent category of the user's request is correctly classified and the user confirms the execution, the semantic vector corresponding to the user's request is used as a positive sample, and the anchor point is moved in the direction of the positive sample to achieve anchor point update. If the intent category of a user's request is misclassified, the semantic vector corresponding to the user's request is used as a negative sample, and the anchor point is moved away from the negative sample to update the anchor point. After each anchor point update, the anchor point is normalized.
4. The data backup management method based on natural language according to claim 3, characterized in that, After each anchor point update, check the distance between adjacent intended anchor points. If the distance between any two anchor points is less than the preset safety margin, then move the two anchor points outward along the connecting line direction.
5. The data backup management method based on natural language according to claim 1, characterized in that, Select a subset of tools relevant to the current intent from the full toolset, including: Construct a predecessor-successor time series co-occurrence graph for tools, where the nodes of the predecessor-successor time series co-occurrence graph are tool nodes; The first set of candidate tools is extracted based on the intent-tool association matrix, and the second set of candidate tools is extracted based on the predecessor-successor temporal co-occurrence graph. If the user's natural language input conversation context contains the previous tool call, then its successor tool is extracted; otherwise, the target tool reverse predecessor query mechanism is activated. The high-frequency predecessor tools of the target tool are queried in the tool predecessor-successor time sequence co-occurrence graph. The first tool candidate set and the second tool candidate set are merged to obtain the merged candidate set, and the comprehensive score of each tool node in the merged candidate set is calculated. The cutoff number is dynamically calculated based on the intent classification confidence and intent tool dispersion. Based on the merged candidate set sorted in descending order of comprehensive score, tool nodes with lower comprehensive scores are pruned according to the truncation number to obtain a tool subset.
6. The data backup management method based on natural language according to claim 1, characterized in that, The intent-tool mapping table includes tools for querying intents, creating intents, deleting intents, executing intents, and controlling intents. The tools corresponding to query intents include tools for listing backup policies, obtaining policy details, listing backup tasks, obtaining task details, obtaining task logs, listing client nodes, listing storage nodes, obtaining node status, listing scheduling plans, obtaining scheduling details, listing client configurations, obtaining database configurations, obtaining system status, and obtaining license information. Tools for creating class intent mappings include tools for creating backup strategies, tools for creating class-assisted queries, and tools for creating scheduling plans. Tools corresponding to deletion intents include tools for deleting backup strategies, deleting task records, deleting auxiliary query tools, and deleting scheduling plans; The tools corresponding to execution-related intents include execution backup tools, execution-related auxiliary query tools, execution recovery tools, and batch execution backup tools; Tools corresponding to control intents include tools for canceling tasks, stopping tasks, restarting tasks, and auxiliary query tools for control.
7. A data backup management method based on natural language according to claim 6, characterized in that, Knowledge-based intent vectors include installation and deployment consultation, troubleshooting consultation, concept explanation consultation, and best practice consultation. General dialogue intent vectors include daily dialogue. Operation intent vectors include query intent vectors, execution intent vectors, creation intent vectors, deletion intent vectors, and control intent vectors. Query intent vectors include querying backup policies, querying backup tasks, querying client nodes, querying storage nodes, querying scheduling plans, querying database configuration, and querying system status. Execution intent vectors include executing backup tasks, executing recovery tasks, and performing batch backups. Creation intent vectors include creating backup policies and creating scheduling plans. Deletion intent vectors include deleting backup policies, deleting task records, and deleting scheduling plans. Control intent vectors include canceling tasks, stopping tasks, and restarting tasks.
8. The data backup management method based on natural language according to claim 7, characterized in that, The user's natural language input and the definition information of the tool subset are input into the large language model, which outputs a tool invocation request. If the tool invocation request is a query-type tool invocation request, it is executed directly; if it is an operation-type tool invocation request, the user is asked for confirmation, and execution is performed after the user's confirmation, including: If the tool subset corresponds to the query intent, the large language model outputs the query parameters, and the backup system API is directly called based on the query parameters to return the formatted results. If the tool subset is the tool corresponding to the creation class intent, call the creation class auxiliary query tool to obtain the context, input the context into the large language model, output the creation parameters, display the confirmation interface for the user to confirm, and execute the confirmation operation after the user confirms. If the tool subset corresponds to the deletion intent, the deletion auxiliary query tool is invoked, deletion parameters are generated through the large language model, cascading impact transmission calculation is performed based on the deletion parameters, the user confirms the cascading impact transmission calculation results, and the deletion operation is executed after the user confirms. If the tool subset is the tool corresponding to the execution class intent, call the execution class auxiliary query tool to obtain the target list, input the target list into the large language model, output the execution range, perform cascading impact transmission calculation based on the execution range, the user performs hierarchical confirmation, and the operation is executed after the user confirms; If the tool subset corresponds to a control-class intent tool, the control-class auxiliary query tool is invoked to query the running tasks. The running tasks are then input into the large language model to match the target task, and the task status and impact are displayed. The user then confirms and executes the control operation.
9. A data backup management method based on natural language according to claim 8, characterized in that, Perform cascading effect propagation calculations, including: Construct a data object-tool influence graph. The nodes of the data object-tool influence graph include tool nodes and data object nodes. The edges of the data object-tool influence graph include operation edges and dependency edges. Operation edges are the edges from tool nodes to data object nodes, and dependency edges are the edges from data object nodes to data object nodes. When a tool call request is received, a breadth-first traversal is performed from the tool node corresponding to the tool call request along the influence edges based on the graph traversal algorithm to calculate the cascade influence score of each data object. The security level is determined based on the magnitude of the impact on the score, and the corresponding confirmation strategy is implemented.
10. An apparatus for a natural language-based data backup management method according to any one of claims 1-9, characterized in that, include: The acquisition module is used to acquire the user's natural language input, which is the user's request for the data backup system; A vector encoding module is used to perform semantic vector encoding on natural language input to obtain a semantic vector. The intent classification module is used to calculate the vector distance between the semantic vector and the preset backup domain intent vector set based on nearest neighbor retrieval, and determine the intent category of the user request. The backup domain intent vector set includes knowledge consultation intent vectors, operation intent vectors, and general dialogue intent vectors. The first generation module is used to retrieve relevant document fragments of the user's intent category from the backup knowledge base if the intent category is knowledge consultation, and input the retrieval results into the large language model to generate an answer. The second generation module is used to directly input the user's natural language input into the large language model to generate a response if the intent category is a general dialogue category. The execution module is used to, if the intent category is operation execution, filter out a subset of tools related to the current intent from the full toolset according to a preset intent-tool mapping table. The full toolset includes query tools and operation tools. The module inputs the user's natural language input and the definition information of the tool subset into the large language model and outputs a tool call request. If the tool call request is a query tool call request, it is executed directly. If it is an operation tool call request, it requests confirmation from the user and executes the request after the user confirms.