A dynamic route and LoRA fusion scene compliance response output method and device

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
By integrating dynamic routing with the LoRA module, an adapter library is built, which solves the problem that existing technologies cannot automatically adapt to compliance rules in multiple regions and fields, achieving efficient and accurate compliance response output, and reducing computational costs and training complexity.

CN122242616APending Publication Date: 2026-06-19GUANGDONG ELECTRIC POWER SCI RES INST ENERGY TECH CO LTD

View PDF 0 Cites 0 Cited by

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Applications(China)
Current Assignee / Owner: GUANGDONG ELECTRIC POWER SCI RES INST ENERGY TECH CO LTD
Filing Date: 2026-03-23
Publication Date: 2026-06-19

Application Information

Patent Timeline

23 Mar 2026

Application

19 Jun 2026

Publication

CN122242616A

IPC: G06N3/084; G06N5/04; G06F9/50

AI Tagging

Application Domain

Resource allocation Biological models

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

AI Technical Summary

Technical Problem

Existing technologies cannot efficiently and accurately adapt to compliance rules in different regions and fields based on user requests, resulting in output answers that cannot meet the compliance requirements of the corresponding scenarios, and are also costly and lack scalability.

Method used

By building an adapter library and integrating the LoRA module with the basic large model, dynamic routing and matching of user requests are achieved. Combined with lightweight LoRA module training and weighted preference optimization, a fast response that meets compliance requirements can be realized.

Benefits of technology

It enables efficient and accurate output of compliant answers in different regions and fields, reduces computing costs, and improves response speed and accuracy. The LoRA module in the adapter library has scenario-specific and professional capabilities.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure CN122242616A_ABST

Patent Text Reader

Abstract

This invention discloses a method and apparatus for scenario compliance response output that integrates dynamic routing and LoRA. The method includes: selecting a target LoRA module matching a user request from an adapter library; obtaining a synthetic model by integrating the target LoRA module and a basic large model; responding to the user request according to the synthetic model; and outputting an answer that conforms to the compliance requirements of the region and domain corresponding to the target LoRA module. This invention proposes a method and apparatus for scenario compliance response output that integrates dynamic routing and LoRA. Through modular LoRA adapters encapsulating multi-scenario compliance rules, dynamic routing enabling automatic matching of user requests and target adapters, and lightweight integration of the basic large model and target adapters, it can automatically adapt to compliance rules in different regions and domains. This solves the problem that existing technologies cannot automatically adapt to compliance rules in different regions and domains based on user requests, making it difficult to efficiently and accurately output answers that conform to the corresponding scenario compliance requirements.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of artificial intelligence technology, and in particular to a method and apparatus for scenario compliance response output that integrates dynamic routing and LoRA. Background Technology

[0002] Today, general artificial intelligence technologies centered on Large Language Models (LLM) and Multimodal Large Models (MLLM) are encountering increasingly severe and complex compliance challenges as they are deployed professionally across different industries and regions. The significant differences in laws and regulations across regions and industries, coupled with varying social ethics and cultural customs, create substantial scenario variations, demanding refined alignment of the model's output. Current mainstream model alignment technologies, represented by reinforcement learning based on human feedback (RLHF) combined with proximal policy optimization (PPO) algorithms, can guide model behavior to some extent. However, this process requires building an independent reward model and conducting reinforcement learning training, relying on expert-labeled data, and necessitates training a new model from scratch or large-scale fine-tuning when adapting to new domains or regional regulatory requirements. Furthermore, it often employs a binary model for processing preferred data.

[0003] However, existing technologies are costly and rely on a large amount of expert-annotated data, making it difficult to support the accumulation and training of compliance data across multiple regions and fields. Furthermore, existing technologies lack scalability, requiring retraining or large-scale fine-tuning to adapt to new scenarios, and cannot quickly respond to compliance changes in regions and fields. In addition, the integration of multiple requirements into a single model can easily lead to compromises in performance, and the binary preference processing mechanism ignores the priority of high-risk compliance rules, making it difficult to accurately output answers that meet the compliance requirements of the corresponding scenarios. Summary of the Invention

[0004] This invention provides a method and apparatus for scenario compliance response output that integrates dynamic routing and LoRA, in order to solve the problem that existing technologies cannot automatically adapt to compliance rules in different regions and fields according to user requests, and are difficult to efficiently and accurately output answers that meet the compliance requirements of the corresponding scenario.

[0005] To achieve the above objectives, this invention provides a method for compliant response output in scenarios that integrate dynamic routing and LoRA, comprising: Get user request; Select a target LoRA module from the adapter library that matches the user request, and obtain a synthetic model by integrating the target LoRA module and the base model; wherein each LoRA module in the adapter library is obtained by training LoRA adapters based on the base model and different datasets, and each LoRA module is adapted to different regions and domain scenarios; The synthetic model responds to the user's request and outputs an answer that meets the compliance requirements of the region and domain scenario corresponding to the target LoRA module.

[0006] This invention trains LoRA adapters specifically for several regional and domain scenarios and constructs an adapter library. It modularly embeds compliance rules for different scenarios into corresponding LoRA modules, transforming compliance logic from generalized and vague to scenario-specific and precise encapsulation, providing a technical carrier for adapting to different compliance rules. Secondly, it achieves dynamic routing through a matching mechanism between user requests and target LoRA modules, automatically identifying the regional or domain scenario corresponding to a user request without manual intervention and accurately calling the LoRA module with built-in compliance rules for that scenario, solving the core problem of existing technologies' inability to automatically adapt. Thirdly, the lightweight nature of LoRA modules allows for integration with the basic large model without full retraining; only parameter fine-tuning is needed to quickly form a synthetic model. This retains the general reasoning capabilities of the basic large model while avoiding the inefficiency of full model training through modular integration, achieving high-efficiency response. Finally, the reasoning process of the synthetic model is constrained by the scenario-specific compliance rules in the matching LoRA modules, ensuring that the output answer strictly adheres to the compliance requirements of the corresponding region and domain. Simultaneously, the lightweight integration mode guarantees reasoning efficiency, ultimately achieving automatic scenario adaptation, efficient response to needs, and accurate output of compliant answers, comprehensively addressing the shortcomings of existing technologies.

[0007] Compared to existing technologies, this invention encapsulates multi-scenario compliance rules through modular LoRA adapters, achieves automatic matching of user requests and target adapters through dynamic routing, and integrates the basic large model and target adapters in a lightweight manner. It can automatically adapt to compliance rules in different regions and fields, and efficiently and accurately output answers that meet the compliance requirements of the corresponding scenarios. Therefore, it can solve the problem that existing technologies cannot automatically adapt to compliance rules in different regions and fields according to user requests, and it is difficult to efficiently and accurately output answers that meet the compliance requirements of the corresponding scenarios.

[0008] As a preferred embodiment, the dataset is obtained in the following manner: For several target alignment scenarios in different regions and fields, publicly available data and authorized private data are collected. By adding corresponding structured prefixes to the collected data, a task fine-tuning dataset is obtained; wherein, the structured prefix is a scenario identifier bound to the region and field. Based on the structured prefix, the basic large model is guided to generate initial output results for the target core requirements of the several target alignment scenarios, and the initial output results are rated for preference intensity to obtain a hierarchical preference dataset; The dataset consists of the task fine-tuning dataset and the hierarchical preference dataset.

[0009] This preferred approach binds regional and domain identifiers with structured prefixes, imbuing the task fine-tuning dataset with inherent scenario attributes and preventing model misjudgments of scenarios. Simultaneously, by combining the preference ratings generated by the base model, objective data is integrated with subjective needs, resulting in a dataset more closely aligned with real-world application scenarios. The combination of publicly available and authorized private data ensures data legitimacy and richness, while the tiered preference dataset provides clear guidance for subsequent model alignment, enabling the trained model to accurately respond to different scenario requirements, reduce meaningless output, and improve the compliance and practicality of the responses.

[0010] As a preferred approach, the initial output results are rated in terms of preference intensity to obtain a graded preference dataset, specifically: The initial output results are classified into risk levels and rated in terms of preference intensity to form an initial hierarchical preference dataset; According to the hierarchical weighting mechanism, samples with different risk levels and preference intensities in the initial hierarchical preference dataset are assigned corresponding weights to obtain the hierarchical preference dataset; wherein, the hierarchical weighting mechanism is set based on the risk level and preference classification information in the hierarchical preference dataset; the weights are positively correlated with the risk level and preference intensity of the samples.

[0011] This optimized rating method achieves refined processing of preference data, breaking through the limitations of traditional single-dimensional evaluation. The multi-dimensional rating of risk level and preference intensity comprehensively characterizes the value attributes of the output results, making the resulting initial dataset more meaningful. The hierarchical weighting mechanism, through the positive correlation between weights and risk level and preference intensity, prioritizes high-risk, high-preference samples during training, allowing the model to learn safe, compliant, and relevant knowledge first. This differentiated weight allocation avoids the problem of unclear model focus caused by balanced weights, strengthens risk control capabilities, and improves the quality and security of the model output.

[0012] As a preferred embodiment, each LoRA module in the adapter library is obtained by training LoRA adapters based on the base large model and different datasets, and each LoRA module is adapted to different regional and domain scenarios, specifically: Based on the task fine-tuning dataset in the dataset, in the network layer of the basic large model, a LoRA adapter is independently trained for each region and domain scene combination in the several target alignment scenarios to obtain several initial LoRA modules; Based on the hierarchical preference dataset in the dataset, the initial LoRA modules are aligned and strengthened by weighted direct preference optimization to obtain the LoRA modules that constitute the adapter library.

[0013] This preferred approach, based on initial training with task-fine-tuned data, enables each module to master specific technical parameters and fundamental compliance rules for particular scenarios, ensuring the module's scenario-specific relevance. Alignment reinforcement is achieved through weighted direct preference optimization, which is more stable and efficient than traditional reinforcement learning methods based on human feedback. It eliminates the need for complex reward model training, reducing computational costs. This two-stage training approach—combining fundamental capabilities and preference reinforcement—enables the LoRA module to possess both professional capabilities and alignment with user preferences and scenario requirements. The resulting adapter library provides high-quality module support for subsequent dynamic routing, improving the response quality of the synthetic model.

[0014] As a preferred embodiment, based on the task fine-tuning dataset in the dataset, in the network layer of the base model, an independent LoRA adapter is trained for each region and domain scene combination in the several target alignment scenarios, resulting in several initial LoRA modules, specifically: In the target network layer of the basic large model, an independent LoRA adapter is injected in parallel for each region and domain combination under several target alignment scenarios; From the task fine-tuning dataset of the dataset, retrieve the task fine-tuning data corresponding to each region and domain combination, and input the retrieved data into the LoRA adapter dedicated to the corresponding region and domain combination for distributed multi-task parallel training, so that each LoRA adapter learns the exclusive technical parameters and basic compliance rules of the corresponding scenario. After all LoRA adapters have been trained with corresponding task fine-tuning data in the aforementioned target alignment scenarios to meet preset index requirements, the aforementioned initial LoRA modules bound to the scenarios are obtained.

[0015] This preferred approach achieves efficient scenario-based training through its initial LoRA module training method, resolving the issues of scenario interference and low efficiency in traditional training. The parallel injection of LoRA adapters freezes the core parameters of the base model while avoiding impact on its performance, ensuring training stability. Distributed multi-task parallel training allows each module to independently learn its corresponding scenario data, effectively avoiding cross-interference between different scenario knowledge and ensuring the purity of module knowledge. Targeted training with dedicated data enables each module to accurately grasp scenario knowledge and compliance rules. The trained scenario-binding modules provide a high-quality foundation for subsequent optimization, significantly improving the efficiency and quality of building the subsequent adapter library.

[0016] As a preferred embodiment, based on the hierarchical preference dataset in the dataset, the initial LoRA modules are aligned and strengthened using a weighted direct preference optimization method to obtain the LoRA modules that constitute the adapter library, specifically as follows: According to the scenario, each initial LoRA module in the plurality of initial LoRA modules and the data matching the hierarchical preference dataset in the dataset are allocated to independent computing nodes of the distributed training system; In each computing node, based on the direct preference optimization algorithm, the sample weights of the hierarchical preference dataset are substituted into the direct preference optimization loss function. The actual log probability difference is obtained by maximizing the log probability difference between the target preferred answer and the target unfavorable answer generated by the initial LoRA module. The sample weights are calculated based on the risk level and preference intensity of the corresponding samples. The weighted preference optimization loss value for the current training step is calculated based on the direct preference optimization loss function and the actual log probability difference. Then, the parameters of the initial LoRA modules are updated based on the weighted preference optimization loss value through the backpropagation algorithm until the initial LoRA modules meet the preset index requirements, resulting in optimized LoRA modules. The adapter library is composed of these LoRA modules.

[0017] This optimized scheme enables parallel optimization of modules through distributed node allocation, significantly shortening training time and improving overall efficiency. Based on the direct preference optimization algorithm, and combined with the calculation of log probability differences using sample weights, the optimization objective is more closely aligned with scenario requirements, avoiding performance bias caused by indiscriminate optimization. Updating parameters using weighted loss values ensures the model prioritizes optimizing high-value samples, enabling modules to quickly meet preset performance requirements. The optimized LoRA module has stronger scenario adaptability, and the built adapter library provides more reliable module support for large models, improving response quality during the inference phase.

[0018] As a preferred option, a target LoRA module matching the user request is selected from the adapter library, specifically as follows: The user request is preprocessed and embedded to generate input representation data containing input semantic information and prefix identifiers; wherein, the prefix identifier is a scene identifier bound to the region and domain corresponding to the user request; The input representation data is passed into the routing network, and the matching degree of the prefix identifier and the structured prefix of the corresponding scenario of all LoRA modules in the adapter library is calculated through the routing network. The selection probability distribution data of all LoRA modules is then output. Based on the selection probability distribution data, the LoRA module with the highest probability is selected from all LoRA modules in the adapter library to obtain the target LoRA module that matches the user request.

[0019] This optimized selection method improves the accuracy and efficiency of LoRA module matching, resolving the issue of chaotic module selection across multiple scenarios. The input representation includes both semantic and scenario prefix information, ensuring the routing network can calculate similarity from both semantic and scenario matching dimensions. The probability distribution data output by the routing network provides a quantitative basis for selection, avoiding errors from subjective judgment. By selecting the module with the highest probability, a high degree of adaptation between the target LoRA module and the user request is ensured, enabling the synthetic model to accurately invoke the knowledge and rules of the corresponding scenario. The entire process requires no manual intervention, achieving automation and intelligence in module selection, and improving the response speed and scenario adaptation accuracy of large models.

[0020] As a preferred embodiment, the routing network is obtained by training a preset classification model architecture based on the cross-entropy loss function.

[0021] In this preferred scheme, the cross-entropy loss function possesses excellent discriminative ability in classification tasks, effectively enhancing the routing network's learning of the correspondence between scene identifiers and LoRA modules, and improving the accuracy of matching degree calculation. The classification model architecture trained based on this function enables the routing network to quickly capture the association features between input representations and modules, making the output selection probability distribution more valuable. This design avoids the randomness of routing decisions, ensuring that the selection of target LoRA modules better meets the needs of real-world scenarios, providing a guarantee for the efficient construction of synthetic models, and indirectly improving the accuracy and compliance of large-scale model inference responses.

[0022] As a preferred embodiment, the basic large model is a pre-trained large model that supports long context processing, and the target core parameters of the basic large model have been set to a frozen state.

[0023] This preferred solution supports long context processing, enabling the model to accurately capture key information in complex inputs, avoiding response biases caused by incomplete context understanding, and improving the ability to handle long text requirements. The core parameter freezing design preserves the powerful basic capabilities of the pre-trained model while avoiding the enormous computational resource consumption of full parameter fine-tuning, lowering the training threshold. Simultaneously, freezing core parameters prevents the degradation of the model's basic capabilities during training, ensuring the output stability of the synthetic model, and allowing the scenario-based optimization of the LoRA module to precisely target specific capability improvements, further enhancing the compliance and accuracy of the responses.

[0024] The present invention also provides a scenario compliance response output device that integrates dynamic routing and LoRA, including a data acquisition module, an integration module and an inference module; The acquisition module is used to acquire user requests; The integration module is used to select a target LoRA module that matches the user request from the adapter library, and to obtain a synthetic model by integrating the target LoRA module and the base model; wherein, each LoRA module in the adapter library is obtained by training LoRA adapters based on the base model and different datasets, and each LoRA module is adapted to different regions and domain scenarios. The reasoning module is used to respond to the user request based on the synthesis model and output an answer that meets the compliance requirements of the region and domain scenario corresponding to the target LoRA module.

[0025] The present invention also provides a storage medium storing a computer program, which is invoked and executed by a computer to implement a scenario compliance response output method for the integration of dynamic routing and LoRA as described above.

[0026] The present invention also provides a computer program product, including a computer program or instructions, which, when executed by a communication device, implements a scenario compliance response output method for dynamic routing and LoRA integration as described above. Attached Figure Description

[0027] Figure 1 This is a flowchart illustrating a scenario compliance response output method that integrates dynamic routing and LoRA, as provided in an embodiment of the present invention. Figure 2 This is a schematic diagram of the preference data classification and weighted DPO process provided in an embodiment of the present invention; Figure 3 This is a schematic diagram of the routing network structure provided in an embodiment of the present invention; Figure 4 This is the overall system architecture diagram provided in the embodiments of the present invention; Figure 5 This is a timing diagram of the system inference process provided in an embodiment of the present invention; Figure 6 This is a schematic diagram of a scenario compliance response output device that integrates dynamic routing and LoRA, provided in an embodiment of the present invention. Detailed Implementation

[0028] The technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.

[0029] In the description of this invention, unless otherwise stated, "a number" means two or more.

[0030] The present invention provides a scenario compliance response output method that integrates dynamic routing and LoRA, aiming to solve the core problems of high cost, poor scalability, poor alignment effect, and weak risk control capability in the existing technology of large model multi-scenario alignment. Specifically, it aims to inject and manage multiple cross-regional and cross-industry compliance and preference alignment capabilities into a single basic large model in a low-cost and high-efficiency manner. It designs a fine-grained and flexible training mechanism that can distinguish different regulatory risk levels and preference intensities and supports adjustable weights. It constructs an intelligent inference system that can automatically select the most matching domain alignment strategy according to the user input context and achieve dynamic and seamless multi-scenario adaptation. At the same time, it avoids the training complexity of traditional RLHF and PPO methods, adopts a more direct and efficient optimization path, and is compatible with diverse preference data and complex compliance requirements.

[0031] Example 1: Please see Figure 1 The present invention provides a method for compliant response output in scenarios that integrate dynamic routing and LoRA, including S1 to S3, and the specific implementation steps are as follows: S1. Obtain user requests.

[0032] Step S1 in this embodiment of the invention specifically includes: The system retrieves user requests, which can be either formatted with explicit geographic and domain-binding prefixes, or as natural language queries or structured API calls containing domain keywords. For example, a format with explicit geographic and domain-binding prefixes, such as "[R1-EP] Please explain the advantages of UHV power transmission," where the prefix "[R1-EP]" corresponds to "Region A - Energy and Power."

[0033] S2. Select the target LoRA module that matches the user request from the adapter library, and obtain the synthetic model by integrating the target LoRA module and the base model. Each LoRA module in the adapter library is obtained by training LoRA adapters based on the base model and different datasets, and each LoRA module is adapted to different regions and domain scenarios.

[0034] Step S2 in this embodiment of the invention includes S2.1 to S2.6; wherein, S2.1 is the process of establishing a basic large model, S2.2 is the process of establishing a dataset, S2.3 and S2.4 are the processes of establishing an adapter library based on the basic large model and the dataset, S2.5 is the process of establishing a routing network, and S2.6 is the process of establishing a synthetic model based on user requests, the routing network, the adapter library, and the basic large model, specifically as follows: S2.1. Based on industry-recognized public benchmarks, select high-performing pre-trained models that support long context processing to obtain the basic large model. Specifically, it must meet multiple key indicators: general and multimodal capabilities must reach advanced industry levels, for example, a C-Eval benchmark score of no less than 72; the number of parameters must cover the range of 7 billion to 70 billion or more, supporting long context processing of at least 32k tokens, and the model architecture must be compatible with low-rank adaptation (LoRA) and other efficient parameter fine-tuning techniques. In this embodiment of the invention, models such as Qwen2-72B-VL can be selected. Public benchmarks include Open-LM Leaderboard (Open Language Model Leaderboard), MMLU (Massive Multitask Language Understanding), and C-Eval (Chinese Evaluation).

[0035] After selecting the basic large model, its target core parameters are frozen while loading the model. This design aims to fully preserve the massive amount of world knowledge and general reasoning capabilities already learned by the basic large model, ensuring that subsequent alignment training for specific regions and domain scenarios focuses only on the newly added lightweight adapter parameters, avoiding interference with the core performance of the basic model, and guaranteeing training efficiency and effectiveness.

[0036] This embodiment, S2.1, supports long context processing, enabling the model to accurately capture key information in complex inputs, avoiding response biases caused by incomplete context understanding, and improving its ability to handle long text requests. The core parameter freezing design preserves the powerful basic capabilities of the pre-trained model while avoiding the enormous computational resource consumption of full parameter fine-tuning, lowering the training threshold. Simultaneously, freezing core parameters prevents the degradation of the model's basic capabilities during training, ensuring the output stability of the synthetic model. This allows the scenario-based optimization of the LoRA module to precisely target specific capability improvements, further enhancing the compliance and accuracy of the responses.

[0037] S2.2 For several target alignment scenarios formed by different regions and fields, collect publicly available information and authorized private information covering laws and regulations, industry standards, professional knowledge bases, and typical question-and-answer pairs. Add corresponding structured prefixes to the collected data for each type of scenario to construct a task fine-tuning dataset. This dataset supports multimodal data organization formats such as text and images. Its built-in scene identification prefixes can help subsequent models achieve accurate scene recognition and route matching. Among them, the structured prefixes are scene identifiers bound to regions and fields, such as [R1-EP] corresponding to region A-energy and electricity, and [R2-MED] corresponding to region B-health.

[0038] By leveraging the scenario guidance provided by the structured prefix, the basic large model is guided to generate initial output results for the core target requirements of several target alignment scenarios; The initial output results are categorized into risk levels and preference strength ratings by a pre-set model, domain experts, or experienced users, forming an initial hierarchical preference dataset containing fields such as prompt, chosen_response, rejected_response, risk_level, pref_strength, region_tag, and domain_tag. The risk level categorization divides the compliance risks associated with each preference pair into multiple levels, such as L1 to L4, with L1 corresponding to extremely high risk and L4 to low risk. This level maps the severity of violations of relevant regulations and the potential security risks or business consequences. The preference strength rating requires annotators to, after selecting a better answer, use a Likert scale of 1-5 points or a three-level rating method ("weak / medium / strong") to clearly state the strength of their preference belief in that answer.

[0039] Following a tiered weighting mechanism, samples with different risk levels and preference intensities in the initial tiered preference dataset are assigned corresponding weights. To ensure the consistency and reliability of the labeled data, a dual-annotator cross-verification mechanism is employed, and statistical measures such as Cohen's kappa are used for evaluation, resulting in the tiered preference dataset. The tiered weighting mechanism is set based on the risk level and preference grading information in the tiered preference dataset; sample weights are positively correlated with the sample's risk level and preference intensity, meaning that samples with high risk levels and high preference intensity will be assigned higher weights. This guides the model to prioritize learning and adhering to key compliance requirements during training.

[0040] The dataset consists of a task fine-tuning dataset and a hierarchical preference dataset.

[0041] In this embodiment, S2.2, by binding regional and domain identifiers with structured prefixes, the task fine-tuning dataset inherently possesses scene attributes, avoiding model misjudgment of scenes. Simultaneously, by combining the preference ratings generated by the basic large model, objective data and subjective needs are integrated, resulting in a dataset more closely aligned with real-world application scenarios. The combination of publicly available and authorized private data ensures data legality and richness, while the tiered preference dataset provides clear guidance for subsequent model alignment, enabling the trained model to accurately respond to different scenario requirements, reduce meaningless output, and improve the compliance and practicality of the responses. Furthermore, the rating method enables refined processing of preference data, overcoming the limitations of traditional single-dimensional evaluation. The multi-dimensional rating of risk level and preference intensity comprehensively characterizes the value attributes of the output results, making the resulting initial dataset more meaningful. The hierarchical weighting mechanism, by setting weights positively correlated with risk level and preference intensity, prioritizes high-risk and high-preference-intensity samples during training, allowing the model to learn safe, compliant, and relevant knowledge first. This differentiated weight allocation avoids the problem of unclear model focus caused by balanced weights, strengthens risk control capabilities, and improves the quality and security of the model output.

[0042] S2.3. For several regional and domain scenarios, LoRA adapters are trained based on the basic large model and dataset to obtain several LoRA modules to form an adapter library. The specific construction process is as follows: In the target network layer of the basic large model, such as the self-attention module and feedforward network in Transformer, an independent LoRA adapter is injected in parallel for each region and domain combination under several target alignment scenarios. The rank of this adapter is set to 8 to 64 and the alpha value is set to 8 to 128. Its parameter count accounts for only 0.01% to 1% of the total parameters of the basic model, which is significantly lightweight. Among them, the LoRA adapter is a parameter-efficient large model fine-tuning technology component. Its core is to achieve rapid adaptation of the large model to specific scenarios, tasks or compliance requirements by training a small number of lightweight low-rank matrices without changing the core parameters of the pre-trained large model, while significantly reducing training and deployment costs.

[0043] The dataset is used to selectively retrieve specific data for each region and domain combination from the task fine-tuning dataset. Then, relying on efficient distributed training frameworks such as PyTorch DDP (Distributed Data Parallel) or DeepSpeed ZeRO (Zero Redundancy Optimizer), a data parallel strategy is adopted to distribute the specific data for different scenarios to multiple computing nodes. At the same time, fp16 / bfloat16 mixed precision training and gradient accumulation technology are enabled, and a global batch size of no less than 256 is set to optimize memory usage efficiency and training speed. After receiving the allocated specific data, each computing node directly inputs it into the specific LoRA adapter for the corresponding region and domain combination to carry out independent distributed multi-task parallel training, so that each LoRA adapter can accurately learn the specific technical parameters and basic compliance rules for the corresponding scenario. Among them, fp16 is a 16-bit half-precision floating-point format, which is efficient but has a narrow numerical range; bfloat16 is a 16-bit brain floating-point format, which balances efficiency and a wide numerical range and can avoid precision loss; both are low-precision calculation formats used in mixed-precision training, which can reduce memory usage and improve training speed.

[0044] After all LoRA adapters have been trained with fine-tuned data for corresponding tasks in several target alignment scenarios to meet the preset index requirements, their adaptation parameters will be solidified into scene-specific independent files, such as R1-EP.lora, R2-MED.lora, etc., and finally, several initial LoRA modules that are scene-bound and plug-and-play are obtained, namely several LoRA adapters to be optimized.

[0045] The initial LoRA module training method in S2.3 of this embodiment achieves efficient scenario-based training, solving the problems of scenario interference and low efficiency in traditional training. The parallel injection of LoRA adapters freezes the core parameters of the base model while avoiding impact on its performance, ensuring training stability. Distributed multi-task parallel training allows each module to independently learn its corresponding scenario data, effectively avoiding cross-interference between different scenario knowledge and ensuring the purity of module knowledge. Targeted training with dedicated data enables each module to accurately master scenario knowledge and compliance rules. The trained scenario-binding modules provide a high-quality foundation for subsequent optimization, significantly improving the efficiency and quality of subsequent adapter library construction.

[0046] S2.4. Based on the scenario dimensions of region and domain, allocate each initial LoRA module and the data matching the hierarchical preference dataset in the dataset to the independent computing nodes of the distributed training system. In each computing node, based on the Direct Preference Optimization (DPO) algorithm, sample weights are calculated according to the risk level and preference strength of the samples in the hierarchical preference dataset. For example, w = weight_risk × weight_strength. A specific weight mapping can be: L1 extremely high risk corresponds to a weight of 4, and strong preference corresponds to a weight of 2. The product of the two is the total weight of the sample. Then, the total weight of the "L1 risk + strong preference" sample can reach 8, and the total weight of the "L4 risk + weak preference" sample can be set to 1. This mechanism makes the model show a stronger learning tendency for high-risk and high-priority compliance requirements during the optimization process. The sample weights of the hierarchical preference dataset are substituted into the Direct Preference Optimization loss function. The actual log probability difference is obtained by maximizing the log probability difference between the "target preferred answer" and the "target inferior answer" generated by the initial LoRA module. Based on the Direct Preference Optimization (DPO) loss function and the actual log probability difference, the weighted preference optimization loss value for the current training step is calculated. Then, the parameters of several initial LoRA modules are updated using the backpropagation algorithm based on this weighted preference optimization loss value. During training, for high-risk domains such as healthcare and finance, additional "compliant refusal" sample pairs are added, where "politely and clearly refusing questions that violate regulations or exceed the model's capabilities, and explaining the reasons" is selected as the preferred answer. Simultaneously, comprehensive indicators such as domain task accuracy, compliance test pass rate, and refusal scenario accuracy on the validation set are continuously monitored. Based on these indicators, an early stopping strategy is set to avoid overfitting, allowing each initial LoRA module to automatically converge to its optimal alignment strategy within its scenario during independent training. Once several initial LoRA modules meet the preset indicator requirements, several optimized LoRA modules that fit the corresponding scenario's compliance requirements are formed. These LoRA modules constitute a dynamically callable adapter library, i.e., the LoRA adapter library.

[0047] For an explanation of the embodiments of the present invention, please refer to [link / reference]. Figure 2 , Figure 2 This is a schematic diagram of the preference data classification and weighted DPO process provided in the embodiment of the present invention, which shows the process of mapping the risk level and preference intensity of preference data into weights and substituting them into the DPO loss function to complete the adapter parameter update.

[0048] In this embodiment, S2.3-S2.4, the initial training based on task fine-tuning data enables each module to master the specific technical parameters and basic compliance rules for a particular scenario, ensuring the module's scenario-specific relevance. Weighted direct preference optimization is used for alignment reinforcement, which is more stable and efficient than traditional reinforcement learning methods based on human feedback, eliminating the need for complex reward model training and reducing computational costs. This two-stage training of basic capabilities and preference reinforcement enables the LoRA module to possess both professional capabilities and adaptability to user preferences and scenario requirements. The constructed adapter library provides high-quality module support for subsequent dynamic routing, improving the response quality of the synthetic model. Furthermore, the distributed node allocation enables parallel optimization of each module, significantly shortening training time and improving overall efficiency. Based on the direct preference optimization algorithm, and combined with the calculation of log probability differences using sample weights, the optimization objective is more closely aligned with scenario requirements, avoiding performance bias caused by indiscriminate optimization. Updating parameters using weighted loss values ensures the model prioritizes optimizing high-value samples, enabling modules to quickly meet preset performance requirements. The optimized LoRA module exhibits stronger scenario adaptability, and the built adapter library provides more reliable module support for large models, improving response quality during the inference phase.

[0049] S2.5. Train the pre-defined classification model architecture based on the cross-entropy loss function to obtain the routing network. The specific construction process is as follows: The routing network is designed with a lightweight architecture as its core principle. Its core task focuses on accurate adaptation selection during the inference phase. Specifically, it needs to automatically match the most suitable LoRA adapter based on information such as the prefix identifier, contextual semantics, and linguistic features of the user request. It also supports outputting the Top-K adapters and their corresponding probabilities to meet the needs of combinatorial or weighted inference in complex scenarios. Its network structure can employ an efficient small classification model architecture, such as a BERT-based text classifier or a multilayer perceptron (MLP). The input is the encoded token embedding of the user request text (prompt) and possible prefix identifier embeddings. The output layer outputs the selection probability distribution of all trained LoRA adapters through a Softmax classifier.

[0050] By using semi-automatic methods such as evaluating the performance of each LoRA adapter on the validation set samples, or by directly using manual annotation, the "best adapter" label is defined for each training data sample, thereby building a supervised training dataset for the routing network.

[0051] During the training of a small classification model architecture using a supervised training dataset, the standard cross-entropy loss function is used as the optimization objective to focus on improving the classification accuracy of the network, ensuring the precision of adapter selection, and obtaining the final routing network.

[0052] Furthermore, to ensure the reliability and adaptability of the inference phase, the routing network also needs to incorporate multi-dimensional robust design: during inference, user requests are first predicted, and the corresponding LoRA module is dynamically loaded based on the highest probability to achieve efficient dynamic selection; at the same time, a confidence threshold is set, and when the highest predicted probability is lower than the threshold, a fallback strategy is triggered, such as selecting a general default LoRA module or activating a preset compliant rejection rule; after the system goes online, feedback data on routing misjudgments are continuously collected, and online calibration and continuous learning are achieved through periodic fine-tuning to continuously improve the accuracy and adaptability of the network, completing the entire process of establishing the routing network from design, training to optimization.

[0053] For an explanation of the embodiments of the present invention, please refer to [link / reference]. Figure 3 , Figure 3 This is a schematic diagram of the routing network structure provided in the embodiment of the present invention, which shows the structure of the routing network and the correspondence between input and output: the input is the result of user request after text preprocessing and embedding, after being processed by the routing network core of lightweight Transformer or MLP architecture, the selection probability distribution of each trained LoRA adapter is output by the Softmax output layer, providing a decision basis for subsequent adapter selection.

[0054] In this embodiment S2.5, the cross-entropy loss function possesses excellent discriminative ability in classification tasks, effectively enhancing the routing network's learning of the correspondence between scene identifiers and LoRA modules, and improving the accuracy of matching degree calculation. The classification model architecture trained based on this function enables the routing network to quickly capture the association features between input representations and modules, making the output selection probability distribution more valuable. This design avoids the randomness of routing decisions, ensuring that the selection of target LoRA modules better meets the needs of actual scenarios, providing a guarantee for the efficient construction of synthetic models, and indirectly improving the accuracy and compliance of large model inference responses.

[0055] S2.6 Perform text preprocessing and embedding processing on the user request to generate input representation data containing input semantic information and prefix identifiers; wherein, the prefix identifier is the scene identifier bound to the region and domain corresponding to the user request; The input representation data is fed into the routing network. The routing network compares the prefix identifier, parses the context semantics and language features, calculates the matching degree between it and the structured prefix of the corresponding scene of all LoRA modules in the adapter library, outputs the selection probability distribution data of all LoRA modules, and supports outputting the Top-K high probability adapters and their corresponding probabilities to meet the needs of complex combination or weighted inference. Based on the selection probability distribution data, the LoRA module with the highest probability is selected from all LoRA modules in the adapter library to obtain the target LoRA module that matches the user's request. To avoid compliance risks caused by ambiguous scene recognition and insufficient matching reliability, a confidence threshold is pre-set to verify the matching results. If the highest predicted probability selected is lower than this threshold, a robust fallback strategy is triggered, specifically in the following two ways: First, a general default LoRA module is selected, which integrates basic compliance requirements across scenarios. After being combined with the basic large model, it generates a response based on general knowledge and basic compliance rules. Second, a preset compliance rejection rule is activated. For inputs that may involve high-risk content or have no clear applicable scenario, the response is politely and clearly rejected with a reason, ensuring the compliance and security of the response.

[0056] By integrating the target LoRA module and the basic large model through the inference engine, a synthetic model is obtained. This synthetic model retains the general knowledge and reasoning capabilities of the basic large model, while incorporating the specific scenario compliance rules, professional knowledge, and preference alignment capabilities corresponding to the target LoRA module. This achieves an accurate combination of general capabilities and scenario adaptation, while avoiding changes to the basic model and efficiently supporting compliance responses in multiple scenarios.

[0057] The selection method in S2.6 of this embodiment improves the accuracy and efficiency of LoRA module matching, solving the problem of chaotic module selection in multiple scenarios. The input representation includes both semantic and scenario prefix information, ensuring that the routing network can calculate similarity from both semantic and scenario matching dimensions. The probability distribution data output by the routing network provides a quantitative basis for selection, avoiding errors from subjective judgment. By selecting the module with the highest probability, a high degree of adaptation between the target LoRA module and the user request is ensured, enabling the synthetic model to accurately invoke the knowledge and rules of the corresponding scenario. The entire process requires no manual intervention, realizing the automation and intelligence of module selection, and improving the response speed and scenario adaptation accuracy of large models.

[0058] S3. Based on the synthetic model, respond to the user's request and output an answer that meets the compliance requirements of the target LoRA module's corresponding region and domain scenario.

[0059] Step S3 in this embodiment of the invention is specifically as follows: User requests are processed through a synthetic model. During the processing, a frozen base model serves as the core, relying on its massive general knowledge and reasoning capabilities. At the same time, the target LoRA module injects specific compliance rules, professional knowledge, and preference alignment capabilities for the corresponding region and domain scenario. The two work together to complete the core reasoning. In addition, auxiliary modules such as compliance rejection, legal citation, and output interpretability explanation are linked in sync to further enhance the security of decision-making and the transparency of output. Finally, an answer is generated that strictly follows the regulatory requirements of the target scenario and user preferences, and meets the compliance requirements of the region and domain scenario corresponding to the target LoRA module.

[0060] The following example demonstrates the implementation process of the parallel automatic alignment of energy sector regulations in regions A and B: ① Basic model: The Qwen2-72B-VL model with a C-Eval benchmark average score of ≥72 was selected, and all core parameters were frozen after loading.

[0061] ② Datasets: A regional energy dataset (prefixed with [R1-EP]) and a regional energy dataset (prefixed with [R2-EP]) were constructed to obtain the task fine-tuning dataset. The regional energy dataset includes the Electricity Law and grid safety Q&A; the regional energy dataset includes regulations from the Regional B Energy Management Committee and standards from the Electricity Reliability Company. Regulatory experts from the energy fields of regions A and B were invited to rate and label the model's outputs in areas such as electricity safety, dispatch instructions, and environmental regulations. Responses involving significant safety hazards were specifically marked as having the highest L1 risk level and strong preference intensity. After double-labeling verification, weights were assigned to obtain a graded preference dataset.

[0062] ③ Distributed LoRA fine-tuning: Based on the PyTorch DDP framework, fp16 mixed precision training and gradient accumulation are enabled to train independent R1-EP.lora and R2-EP.lora adapters in parallel on a multi-GPU cluster for the [R1-EP] and [R2-EP] datasets, respectively.

[0063] ④ Weighted DPO Optimization: Using the graded preference dataset, weighted DPO optimization was performed on the R1-EP.lora adapter and the R2-EP.lora adapter respectively, focusing on strengthening the model's ability to comply with key compliance requirements in the two regions, and monitoring indicators such as rejection rate and accuracy of security questions.

[0064] ⑤ Routing network training: Construct a classification model architecture consisting of an MLP and a 2-layer lightweight Transformer. The input includes request token embedding and prefix embedding. After training, the scene classification accuracy on the validation set is ≥99%, and the routing network can be obtained.

[0065] ⑥ Inference Process Demonstration: When the system receives a request containing "Region A New Power System", the routing network automatically selects and loads the R1-EP.lora adapter. When the system receives a query about "NERC reliability standards", the routing network automatically switches to the R2-EP.lora adapter; if the matching confidence is insufficient, a general adapter or compliance rejection is triggered.

[0066] ⑦ Application scenario output: The system can adaptively generate professional answers that comply with the local regulatory system based on the regional background of the user's question, such as citing specific legal provisions and generating compliant policy summaries, ensuring the compliance, professionalism and reliability of the answers.

[0067] For an explanation of the embodiments of the present invention, please refer to [link / reference]. Figure 4-5 , Figure 4 This is the overall system architecture diagram provided by the embodiments of the present invention, which shows the basic large model with parameter freezing, multiple scenario-specific LoRA adapter modules and routing network, and clearly presents the interaction relationship between the three in the entire training and inference process.

[0068] Figure 5 This is a system inference flow sequence diagram provided in this embodiment of the invention, which shows the inference flow from receiving user requests, to route matching, loading the target LoRA adapter, generating a response, and finally outputting the final result.

[0069] Overall, the embodiments of the present invention have the following beneficial effects: This invention trains LoRA adapters and builds an adapter library specifically for several regional and domain scenarios. It modularly embeds compliance rules for different scenarios into corresponding LoRA modules, transforming compliance logic from generalized and vague to scenario-specific and precise encapsulation, providing a technical carrier for adapting to different compliance rules. Secondly, it achieves dynamic routing through a matching mechanism between user requests and target LoRA modules, automatically identifying the region or domain scenario corresponding to a user request without manual intervention and accurately calling the LoRA module with built-in compliance rules for that scenario, solving the core problem of existing technologies' inability to automatically adapt. Thirdly, the lightweight nature of LoRA modules allows for integration with the basic large model without full retraining; only parameter fine-tuning is needed to quickly form a synthetic model. This retains the general reasoning capabilities of the basic large model while avoiding the inefficiency of full model training through modular integration, achieving high-efficiency response. Finally, the reasoning process of the synthetic model is constrained by the scenario-specific compliance rules in the matching LoRA modules, ensuring that the output answer strictly adheres to the compliance requirements of the corresponding region and domain. Simultaneously, the lightweight integration mode guarantees reasoning efficiency, ultimately achieving automatic scenario adaptation, efficient response to needs, and accurate output of compliant answers, comprehensively addressing the shortcomings of existing technologies. In summary, this invention significantly reduces the training cycle, computational resource consumption, and storage costs of multi-scenario alignment tasks by freezing the core parameters of the basic model and training only a lightweight LoRA adapter with a minimal number of parameters. When facing new regional or compliance scenario requirements, there is no need to modify the basic model; only the corresponding LoRA adapter needs to be trained and connected to the system to achieve low-cost, high-efficiency incremental capability expansion and flexible response to dynamic business changes. Furthermore, the innovative preference data hierarchical system and weighted DPO optimization mechanism guide the model to prioritize and strictly adhere to high-risk, high-priority compliance criteria, significantly enhancing the security and risk controllability of applications in key areas. The lightweight dynamic routing network can automatically match the optimal alignment strategy, effectively simplifying the application layer development process and the complexity of multi-model management, and significantly improving the model's adaptability and compliance robustness in complex scenarios involving multiple branches and cross-regional operations within an enterprise. The entire solution is highly automated, significantly reducing manual intervention costs and possessing strong potential for large-scale industrial deployment, efficiently adapting to dynamically changing scenario maintenance needs. Furthermore, this invention innovatively proposes and implements a highly engineered training and inference mechanism that integrates basic large-scale model parameter freezing, parallel training of multi-scenario LoRA adapters, multi-dimensional preference risk-based weighted optimization, and dynamic routing network selection. This invention provides a low-cost, easy-to-maintain, highly scalable, and secure solution for the compliant application of enterprise-level AI large-scale models under multiple regions and regulatory standards; it is particularly suitable for globalized industries with extremely high compliance requirements, such as energy, finance, and healthcare, and has broad industrial application prospects when handling multi-standard, multi-language text and multimodal tasks.

[0070] Example 2: Please see Figure 6 The embodiments of the present invention provide a scenario compliance response output device that integrates dynamic routing and LoRA, including a data acquisition module 10, an integration module 20 and an inference module 30; Among them, the acquisition module 10 is used to acquire user requests; Integration module 20 is used to select the target LoRA module that matches the user request from the adapter library, and to obtain the synthetic model by integrating the target LoRA module and the base model. Each LoRA module in the adapter library is obtained by training LoRA adapters based on the base model and different datasets, and each LoRA module is adapted to different regions and domain scenarios. The inference module 30 is used to respond to user requests based on the synthetic model and output answers that meet the compliance requirements of the region and domain scenario corresponding to the target LoRA module.

[0071] It should be noted that the technical concept of this second embodiment is completely consistent with that of the first embodiment. The two maintain a high degree of synergy at the technical logic level. The specific technical details can be referred to the relevant description of the first embodiment, which will not be repeated here.

[0072] Example 3: This invention provides a computer-readable storage medium, which includes a stored computer program, wherein the computer program, when running, controls the device where the computer-readable storage medium is located to execute the aforementioned scenario compliance response output method for dynamic routing and LoRA fusion. The scenario compliance response output method integrating dynamic routing and LoRA, when implemented as a software functional unit and used as an independent product, can be stored in a computer-readable storage medium. Based on this understanding, all or part of the processes in the above embodiments can also be implemented by a computer program instructing related hardware. This computer program can be stored in a computer-readable storage medium, and when executed by a processor, it can implement the steps of the various method embodiments described above. The computer program includes computer program code, which can be in the form of source code, object code, executable files, or certain intermediate forms. The computer-readable medium can include any entity or device capable of carrying the computer program code, a recording medium, a USB flash drive, a portable hard drive, a magnetic disk, an optical disk, a computer memory, a read-only memory (ROM), a random access memory (RAM), an electrical carrier signal, a telecommunication signal, and a software distribution medium, etc.

[0073] In the above embodiments, implementation can be achieved entirely or partially through software, hardware, firmware, or any combination thereof. When implemented using software, it can be implemented entirely or partially in the form of a computer program product. The computer program product includes one or more computer programs or instructions. When the computer program or instructions are loaded and executed on a computer, the processes or functions described in the embodiments of the present invention are performed entirely or partially. The computer can be a general-purpose computer, a special-purpose computer, a computer network, a network device, a user equipment, or other programmable device. The computer program or instructions can be stored in a computer-readable storage medium or transferred from one computer-readable storage medium to another. For example, the computer program or instructions can be transferred from one website, computer, server, or data center to another website, computer, server, or data center via wired or wireless means. The computer-readable storage medium can be any available medium that a computer can access or a data storage device such as a server or data center that integrates one or more available media. The available medium can be a magnetic medium, such as a floppy disk, hard disk, or magnetic tape; it can also be an optical medium, such as a digital video optical disc; or it can be a semiconductor medium, such as a solid-state drive. The computer-readable storage medium may be a volatile or non-volatile storage medium, or may include both types of storage media.

[0074] The above are preferred embodiments of the present invention. It should be noted that, for those skilled in the art, several improvements and modifications can be made without departing from the principle of the present invention, and these improvements and modifications are also considered to be within the scope of protection of the present invention.

Claims

1. A method for compliant response output in scenarios integrating dynamic routing and LoRA, characterized in that, include: Get user request; Select a target LoRA module from the adapter library that matches the user request, and obtain a synthetic model by integrating the target LoRA module and the base model; wherein each LoRA module in the adapter library is obtained by training LoRA adapters based on the base model and different datasets, and each LoRA module is adapted to different regions and domain scenarios; The synthetic model responds to the user's request and outputs an answer that meets the compliance requirements of the region and domain scenario corresponding to the target LoRA module.

2. The scenario compliance response output method for dynamic routing and LoRA integration as described in claim 1, characterized in that, The dataset is obtained in the following ways: For several target alignment scenarios in different regions and fields, publicly available data and authorized private data are collected. By adding corresponding structured prefixes to the collected data, a task fine-tuning dataset is obtained; wherein, the structured prefix is a scenario identifier bound to the region and field. Based on the structured prefix, the basic large model is guided to generate initial output results for the target core requirements of the several target alignment scenarios, and the initial output results are rated for preference intensity to obtain a hierarchical preference dataset; The dataset consists of the task fine-tuning dataset and the hierarchical preference dataset.

3. The scenario compliance response output method for dynamic routing and LoRA integration as described in claim 2, characterized in that, The initial output results are then rated in terms of preference intensity to obtain a tiered preference dataset, specifically: The initial output results are classified into risk levels and rated in terms of preference intensity to form an initial hierarchical preference dataset; According to the hierarchical weighting mechanism, samples with different risk levels and preference intensities in the initial hierarchical preference dataset are assigned corresponding weights to obtain the hierarchical preference dataset; wherein, the hierarchical weighting mechanism is set based on the risk level and preference classification information in the hierarchical preference dataset; the weights are positively correlated with the risk level and preference intensity of the samples.

4. The scenario compliance response output method for dynamic routing and LoRA integration as described in claim 2, characterized in that, Each LoRA module in the adapter library is obtained by training LoRA adapters based on the aforementioned base model and different datasets, and each LoRA module is adapted to different geographic and domain scenarios, specifically: Based on the task fine-tuning dataset in the dataset, in the network layer of the basic large model, a LoRA adapter is independently trained for each region and domain scene combination in the several target alignment scenarios to obtain several initial LoRA modules; Based on the hierarchical preference dataset in the dataset, the initial LoRA modules are aligned and strengthened by weighted direct preference optimization to obtain the LoRA modules that constitute the adapter library.

5. The scenario compliance response output method for dynamic routing and LoRA integration as described in claim 4, characterized in that, Based on the task fine-tuning dataset in the dataset, in the network layer of the base model, an independent LoRA adapter is trained for each region and domain scene combination in the several target alignment scenarios, resulting in several initial LoRA modules, specifically: In the target network layer of the basic large model, an independent LoRA adapter is injected in parallel for each region and domain combination under several target alignment scenarios; From the task fine-tuning dataset of the dataset, retrieve the task fine-tuning data corresponding to each region and domain combination, and input the retrieved data into the LoRA adapter dedicated to the corresponding region and domain combination for distributed multi-task parallel training, so that each LoRA adapter learns the exclusive technical parameters and basic compliance rules of the corresponding scenario. After all LoRA adapters have been trained with corresponding task fine-tuning data in the aforementioned target alignment scenarios to meet preset index requirements, the aforementioned initial LoRA modules bound to the scenarios are obtained.

6. The scenario compliance response output method for dynamic routing and LoRA integration as described in claim 4, characterized in that, Based on the hierarchical preference dataset in the dataset, the initial LoRA modules are aligned and strengthened using a weighted direct preference optimization method to obtain the LoRA modules that constitute the adapter library. Specifically: According to the scenario, each initial LoRA module in the plurality of initial LoRA modules and the data matching the hierarchical preference dataset in the dataset are allocated to independent computing nodes of the distributed training system; In each computing node, based on the direct preference optimization algorithm, the sample weights of the hierarchical preference dataset are substituted into the direct preference optimization loss function. The actual log probability difference is obtained by maximizing the log probability difference between the target preferred answer and the target unfavorable answer generated by the initial LoRA module. The sample weights are calculated based on the risk level and preference intensity of the corresponding samples. The weighted preference optimization loss value for the current training step is calculated based on the direct preference optimization loss function and the actual log probability difference. Then, the parameters of the initial LoRA modules are updated based on the weighted preference optimization loss value through the backpropagation algorithm until the initial LoRA modules meet the preset index requirements, resulting in optimized LoRA modules. The adapter library is composed of these LoRA modules.

7. The scenario compliance response output method for dynamic routing and LoRA integration as described in claim 1, characterized in that, Select the target LoRA module that matches the user request from the adapter library, specifically: The user request is preprocessed and embedded to generate input representation data containing input semantic information and prefix identifiers; wherein, the prefix identifier is a scene identifier bound to the region and domain corresponding to the user request; The input representation data is passed into the routing network, and the matching degree of the prefix identifier and the structured prefix of the corresponding scenario of all LoRA modules in the adapter library is calculated through the routing network. The selection probability distribution data of all LoRA modules is then output. Based on the selection probability distribution data, the LoRA module with the highest probability is selected from all LoRA modules in the adapter library to obtain the target LoRA module that matches the user request.

8. The scenario compliance response output method for dynamic routing and LoRA integration as described in claim 7, characterized in that, The routing network is obtained by training a pre-defined classification model architecture based on the cross-entropy loss function.

9. A method for compliant response output in a scenario integrating dynamic routing and LoRA as described in any one of claims 1-8, characterized in that, The basic large model is a pre-trained large model that supports long context processing, and the target core parameters of the basic large model have been set to a frozen state.

10. A scenario compliance response output device integrating dynamic routing and LoRA, characterized in that, It includes a data acquisition module, an integration module, and an inference module; The acquisition module is used to acquire user requests; The integration module is used to select a target LoRA module that matches the user request from the adapter library, and to obtain a synthetic model by integrating the target LoRA module and the base model; wherein, each LoRA module in the adapter library is obtained by training LoRA adapters based on the base model and different datasets, and each LoRA module is adapted to different regions and domain scenarios. The reasoning module is used to respond to the user request based on the synthesis model and output an answer that meets the compliance requirements of the region and domain scenario corresponding to the target LoRA module.

11. A storage medium, characterized in that, The storage medium stores a computer program, which is called and executed by a computer to implement a scenario compliance response output method for dynamic routing and LoRA integration as described in any one of claims 1 to 9.

12. A computer program product, comprising a computer program or instructions, characterized in that, When the computer program or instructions are executed by the communication device, they implement a scenario compliance response output method for dynamic routing and LoRA integration as described in any one of claims 1 to 9.