Large Language Model Compression Method and Related Devices for Power Business
By constructing a dual-track distillation framework that combines reasoning thought chain with tool call information, the deployment challenge of large language models in the power business field is solved, achieving efficient and accurate edge deployment and real-time decision-making, adapting to multiple scenarios and supporting flexible iteration.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- CHINA ELECTRIC POWER RESEARCH INSTITUTE CO LTD
- Filing Date
- 2025-09-29
- Publication Date
- 2026-06-30
Smart Images

Figure CN121303372B_ABST
Abstract
Description
Technical Field
[0001] This invention belongs to the field of power intelligence and relates to a method and related apparatus for compressing large language models for power business. Background Technology
[0002] With the significant increase in the scale and complexity of new energy power grids, traditional simulation and decision-making methods are inefficient in dealing with massive, high-dimensional data, making it difficult to meet the real-time and accuracy requirements of power systems. Artificial intelligence technology has been introduced to improve computational simulation capabilities, with large language models for power business being an important component. However, while large language models for power business possess powerful reasoning and decision-making capabilities, their high latency, high computing power requirements, and high hardware performance requirements make them difficult to adapt to on-site or edge deployment scenarios in the power industry.
[0003] To address the deployment challenges of large language models for power industry applications, various model compression techniques have been proposed, including pruning, quantization, and knowledge distillation. However, these techniques have limitations when applied to the power industry domain. Pruning reduces model size by removing unimportant connections or parameters, but excessive pruning or improper methods can lead to a significant decrease in model accuracy. Furthermore, the pruned model structure may become sparse, making it difficult to run efficiently on general-purpose hardware and sometimes even increasing storage overhead. Quantization reduces model size and accelerates inference by lowering the numerical precision of model parameters, but excessive quantization can also result in accuracy loss. Moreover, different hardware platforms have varying levels of support for quantization, potentially causing deployment compatibility issues. Knowledge distillation is a technique for transferring knowledge from a large teacher model to a small student model, but its effectiveness is highly dependent on the quality of the teacher model, the capacity of the student model, and the choice of distillation strategy. Additionally, in the power industry domain, knowledge distillation based on fixed text is prone to causing small models to generate false facts and fails to learn the underlying logic of tool calls, resulting in poor compression performance. Summary of the Invention
[0004] The purpose of this invention is to overcome the shortcomings of the prior art and provide a large language model compression method and related apparatus for power business.
[0005] To achieve the above objectives, the present invention employs the following technical solution:
[0006] In a first aspect, this invention provides a method for compressing a large language model for power business, comprising: acquiring training samples based on a large language model for power business; wherein the training samples include an input question, a reasoning thought chain, tool call information, and a reasoning answer; constructing a lightweight model and setting a parallel thought chain generation head and a tool call head after the lightweight model to obtain a compressed model to be trained; wherein the lightweight model is used to perform reasoning based on the input question, reasoning steps, and call results and output the reasoning results; the thought chain generation head is used to generate reasoning steps based on the reasoning results and feed them back to the lightweight model; the tool call head is used to determine whether a tool needs to be called based on the reasoning results and, when a tool needs to be called, to call the tool and feed the call results back to the lightweight model; and training the compressed model to be trained based on the training samples to obtain the compressed model.
[0007] Optionally, when obtaining training samples based on a large language model for power business, several training samples may be obtained for the same input problem.
[0008] Optionally, the construction of the lightweight model includes: constructing a lightweight model using a Transformer architecture or a long short-term memory network.
[0009] Optionally, when acquiring training samples based on a large language model for power business, the tool invocation action in the tool invocation information is encoded in the form of a token or structured instruction; the tool invocation header invoking the tool when it is needed includes: the tool invocation header generating an invocation instruction encoded in the form of a token or structured instruction based on the inference result and sending it to the invocation interface; wherein, the invocation instruction is used to trigger the invocation interface to obtain invocation parameters and invoke the tool and provide feedback on the invocation result.
[0010] Optionally, when training the compressed model to be trained based on the plurality of training samples, the following loss function is used. :
[0011]
[0012] in, The error between each reasoning step of the compressed model to be trained and the reasoning thought chain of the training samples; The error between the final inference result of the compressed model to be trained and the inference answer of the training samples; The error between the tool calls of the compressed model to be trained and the tool call information of the training samples; and All are preset weighting coefficients.
[0013] In a second aspect, the present invention provides a large language model compression system for power business, comprising: a sample acquisition module for acquiring training samples based on a large language model for power business; wherein the training samples include an input question, a reasoning thought chain, tool call information, and a reasoning answer; a model construction module for constructing a lightweight model and setting a parallel thought chain generation head and a tool call head after the lightweight model to obtain a compressed model to be trained; wherein the lightweight model is used to perform reasoning based on the input question, reasoning steps, and call results and output the reasoning results; the thought chain generation head is used to generate reasoning steps based on the reasoning results and feed them back to the lightweight model; the tool call head is used to determine whether a tool needs to be called based on the reasoning results and, when a tool needs to be called, to call the tool and feed the call results back to the lightweight model; and a model compression module for training the compressed model to be trained based on the training samples to obtain the compressed model.
[0014] Optionally, when obtaining training samples based on a large language model for power business, several training samples may be obtained for the same input problem.
[0015] Optionally, the construction of the lightweight model includes: constructing a lightweight model using a Transformer architecture or a long short-term memory network.
[0016] Optionally, when acquiring training samples based on a large language model for power business, the tool invocation action in the tool invocation information is encoded in the form of a token or structured instruction; the tool invocation header invoking the tool when it is needed includes: the tool invocation header generating an invocation instruction encoded in the form of a token or structured instruction based on the inference result and sending it to the invocation interface; wherein, the invocation instruction is used to trigger the invocation interface to invoke the tool and provide feedback on the invocation result.
[0017] Optionally, when training the compressed model to be trained based on the plurality of training samples, the following loss function is used. :
[0018]
[0019] in, The error between each reasoning step of the compressed model to be trained and the reasoning thought chain of the training samples; The error between the final inference result of the compressed model to be trained and the inference answer of the training samples; The error between the tool calls of the compressed model to be trained and the tool call information of the training samples; and All are preset weighting coefficients.
[0020] In a third aspect, the present invention provides a computer device including a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor, when executing the computer program, implements the above-described large language model compression method for power business.
[0021] In a fourth aspect, the present invention provides a computer-readable storage medium storing a computer program that, when executed by a processor, implements the above-described large language model compression method for power business.
[0022] Compared with the prior art, the present invention has the following beneficial effects:
[0023] This invention presents a large language model compression method for power business applications. Through knowledge distillation, it compresses the large language model for power business applications into a lightweight architecture, overcoming the limitations of traditional knowledge distillation which only focuses on textual reasoning. It deeply integrates the compression of the large language model for power business applications with the needs of power business operations, constructing a dual-track distillation framework that combines reasoning thought chains and tool call information. On one hand, the large language model for power business applications serves as a teacher model, generating multimodal training samples containing input questions, reasoning thought chains, tool call information, and reasoning answers, enabling the compressed model to learn chain-like reasoning logic. On the other hand, a thought chain generation head and a tool call head are introduced into the compressed model to be trained, simultaneously optimizing reasoning steps and tool call decisions. This ensures that the decision output contains a complete reasoning chain, avoiding the bias of pure language reasoning when combined with a power physics model, effectively improving decision accuracy and meeting system regulatory requirements. Furthermore, the dynamic injection of tool call results during reasoning forms a closed loop, enabling real-time integration of power grid operation data or simulation results. This solves the problem of the disconnect between pure language reasoning and power physics laws, effectively reducing reasoning errors and enhancing adaptability to complex scenarios. Ultimately, the compressed model achieves a balance between efficiency, interpretability, and professionalism, effectively reducing inference latency and computing power requirements, and meeting the edge deployment and real-time decision-making needs of the power industry. Furthermore, the compression method can cover multiple scenarios and supports flexible iteration, with significantly improved generalization capabilities compared to traditional methods, enabling rapid adaptation to new scenarios such as renewable energy grid connection. Attached Figure Description
[0024] Figure 1 This is a flowchart of a large language model compression method for power business according to an embodiment of the present invention.
[0025] Figure 2 This is a schematic diagram of the compression model inference process according to an embodiment of the present invention.
[0026] Figure 3 This is a block diagram of a large language model compression system for power business according to an embodiment of the present invention. Detailed Implementation
[0027] To enable those skilled in the art to better understand the present invention, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings of the embodiments of the present invention. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort should fall within the scope of protection of the present invention.
[0028] It should be noted that the terms "first," "second," etc., in the specification, claims, and accompanying drawings of this invention are used to distinguish similar objects and are not necessarily used to describe a specific order or sequence. It should be understood that such data can be interchanged where appropriate so that the embodiments of the invention described herein can be implemented in orders other than those illustrated or described herein. Furthermore, the terms "comprising" and "having," and any variations thereof, are intended to cover a non-exclusive inclusion; for example, a process, method, system, product, or apparatus that comprises a series of steps or units is not necessarily limited to those steps or units explicitly listed, but may include other steps or units not explicitly listed or inherent to such processes, methods, products, or apparatus.
[0029] First, let's introduce the relevant terminology used in the embodiments of this invention:
[0030] Large-scale language models: These are natural language processing models based on deep learning, possessing powerful language understanding and generation capabilities. While large-scale language models for the power industry can process text data and accelerate licensing and approval processes, their deployment on the business side has limitations.
[0031] Knowledge distillation: a model compression technique that reduces parameter size and computational cost while maintaining performance by having a smaller model learn the output or intermediate features of a larger model.
[0032] Chain reasoning: The process by which a model derives the answer step by step by generating intermediate reasoning steps can improve the accuracy and interpretability of multi-step reasoning tasks, and is usually very effective in large models.
[0033] Transformer architecture: A deep learning architecture based on self-attention mechanism, widely used in natural language processing tasks, characterized by strong parallel computing capabilities and good long sequence modeling performance.
[0034] Joint loss function: A loss function that combines multiple loss optimization objectives. It is generally used to balance the training priority of each optimization objective through weight coefficients.
[0035] The present invention will now be described in further detail with reference to the accompanying drawings:
[0036] See Figure 1 In one embodiment of the present invention, a large language model compression method for power business is provided. This method can effectively compress a large language model for power business while meeting the high accuracy requirements of the power system business side, thereby constructing a miniaturized model that can run efficiently on resource-constrained edge devices.
[0037] Specifically, the large language model compression method for power business of this invention includes the following steps:
[0038] S1: Obtain training samples based on a large language model for power business; wherein, the training samples include input questions, reasoning thought chains, tool call information, and reasoning answers.
[0039] S2: Construct a lightweight model and set up parallel thought chain generation head and tool call head after the lightweight model to obtain the compressed model to be trained; wherein, the lightweight model is used to perform inference based on the input question, inference steps and call results and output the inference results; the thought chain generation head is used to generate inference steps based on the inference results and feed them back to the lightweight model; the tool call head is used to determine whether to call the tool based on the inference results and to call the tool when it is necessary and feed the call results back to the lightweight model.
[0040] S3: Train the compressed model to be trained based on the training samples to obtain the compressed model.
[0041] This invention presents a large language model compression method for power business applications. Through knowledge distillation, it compresses the large language model for power business applications into a lightweight architecture, overcoming the limitations of traditional knowledge distillation which only focuses on textual reasoning. It deeply integrates the compression of the large language model for power business applications with the needs of power business operations, constructing a dual-track distillation framework that combines reasoning thought chains and tool call information. On one hand, the large language model for power business applications serves as a teacher model, generating multimodal training samples containing input questions, reasoning thought chains, tool call information, and reasoning answers, enabling the compressed model to learn chain-like reasoning logic. On the other hand, a thought chain generation head and a tool call head are introduced into the compressed model to be trained, simultaneously optimizing reasoning steps and tool call decisions. This ensures that the decision output contains a complete reasoning chain, avoiding the bias of pure language reasoning when combined with a power physics model, effectively improving decision accuracy and meeting system regulatory requirements. Furthermore, the dynamic injection of tool call results during reasoning forms a closed loop, enabling real-time integration of power grid operation data or simulation results. This solves the problem of the disconnect between pure language reasoning and power physics laws, effectively reducing reasoning errors and enhancing adaptability to complex scenarios. Ultimately, the compressed model achieves a balance between efficiency, interpretability, and professionalism, effectively reducing inference latency and computing power requirements, and meeting the edge deployment and real-time decision-making needs of the power industry. Furthermore, the compression method can cover multiple scenarios and supports flexible iteration, with significantly improved generalization capabilities compared to traditional methods, enabling rapid adaptation to new scenarios such as renewable energy grid connection.
[0042] In an interpretive manner, when acquiring training samples for a large-scale language model based on power business operations, input questions are designed for typical power business scenarios, such as fault location and power flow calculation. The large-scale language model for power business operations outputs a complete reasoning chain, tool call information, and reasoning answer. The reasoning chain is composed of the reasoning steps of the large-scale language model for power business operations arranged chronologically. The tool call information is generated by recording the trajectory information of each tool call during the reasoning process of the large-scale language model for power business operations. This trajectory information includes the type of tool called (such as a power grid simulation module and equipment status query API), the parameters input to the called tool, and the call result returned by the called tool.
[0043] The interpretable training samples are designed based on reasoning thought chains and tool call information, fully integrating natural language reasoning and tool call information to form a "thinking-action-observation" cycle based on a large language model oriented towards power business. This provides the compressed model to be trained with a learnable reasoning paradigm and tool usage logic, ensuring that the compressed model can operate efficiently and maintain high accuracy on the power system business side.
[0044] An interpretable, thought chain generation head is added to the lightweight model to predict the intermediate reasoning steps of the large language model for power business applications. This enables the compressed model to achieve a coherent chain-like thinking process, assisting it in completing subsequent coherent chain-like reasoning processes. It helps the compressed model learn and reproduce the intermediate reasoning logic of the large language model for power business applications, improving the interpretability and accuracy of the reasoning. A tool invocation head is also added to the lightweight model, which can be set in parallel with the thought chain generation head. This head is responsible for determining whether to invoke external tools and invoking them when necessary, such as invoking the power grid simulation interface. This enables explicit modeling of tool invocation decisions, providing data support for subsequent reasoning and avoiding a disconnect between pure language reasoning and the laws of power physics. Simultaneously, a design that is independent of the compression framework is implemented, i.e., chain-like reasoning and tool invocation capabilities are embedded through data distillation and joint training. This greatly enriches the structure of the lightweight model, adapting it to different application scenarios.
[0045] Optionally, the thought chain generation head can be built based on the Transformer decoder structure, and the tool invocation head can be built based on a multilayer perceptron (MLP) combined with a classifier.
[0046] For example, when training the compressed model to be trained based on the training samples, training can be performed using edge-cloud collaboration or federated learning. Edge-cloud collaboration can involve pre-training with massive amounts of data in the cloud and fine-tuning at the edge based on specific power tasks.
[0047] In one possible implementation, when obtaining training samples based on a large language model for power business, several training samples are obtained for the same input problem.
[0048] Explanatory methods involve sampling multiple training samples for the same input question. By introducing diverse sample representations, the compressed model's ability to understand complex semantics and contexts in the power sector can be effectively enhanced, avoiding overfitting due to the limitations of a single sample. At the same time, diverse training samples can ultimately improve the robustness of the compressed model to ambiguous expressions, variations of technical terms, and multi-turn dialogue scenarios in power business. Ultimately, this enables the compressed model to generate more accurate and comprehensive reasoning in real power business scenarios, significantly optimizing the practical application effect.
[0049] In one possible implementation, the construction of the lightweight model includes: constructing a lightweight model using a Transformer architecture or a long short-term memory network.
[0050] Interpretive models employ lightweight models built on Transformer architectures (such as BERT (Bidirectional Encoder Representation Transformer) and GPT (Generative Pretrained Transformer) variants) or Long Short-Term Memory (LSTM) networks to achieve a balance between model parameter count and inference speed, significantly improving response efficiency and accuracy for tasks such as real-time decision-making and multi-turn dialogue in power scenarios.
[0051] In one possible implementation, when acquiring training samples based on a large language model for power business, the tool invocation action in the tool invocation information is encoded in the form of a token or structured instruction; the tool invocation header invoking the tool when it is needed includes: the tool invocation header generating an invocation instruction encoded in the form of a token or structured instruction based on the inference result and sending it to the invocation interface; wherein, the invocation instruction is used to trigger the invocation interface to invoke the tool and provide feedback on the invocation result.
[0052] Interpretive methods are employed through pre-defined external tool interfaces specifically designed for power business, such as power grid simulation modules (for power flow calculations and fault simulations) and equipment status query APIs (for returning field equipment operating parameters). During the acquisition of training samples for a large-scale language model based on power business needs, the relevant tool call actions are encoded as call instructions using tokens or structured commands. This allows the large-scale language model to generate corresponding call instructions when needed. After learning these patterns, the compressed model can output similar instructions during inference. Specifically, a discrimination mechanism is added to the compressed model: when the output is a call instruction, the system triggers the corresponding call interface to invoke the tool, injecting the data returned by the tool (the call result) as a new input context into the compressed model to guide subsequent inference.
[0053] For example, the specific interaction mechanism is as follows:
[0054] 1. Call Trigger: When the compressed model outputs a call command containing a token or structured instruction, the corresponding call interface is automatically triggered. The call parameters are obtained from the inference results based on the tool call header, and the call parameters are input into the call tool to obtain the call results and provide feedback (such as simulation result values and device status text).
[0055] 2. Data Injection: The call results returned by the calling tool are injected into the compressed model as new input for subsequent inference steps, forming a closed loop of "inference-call-re-inference".
[0056] Based on this design, the limitations of pure language reasoning can be effectively avoided. By improving the alignment between decision-making and the physical laws of the power system through real-time data interaction, the risk of distortion associated with pure language reasoning can be reduced.
[0057] In one possible implementation, when training the compressed model to be trained based on the plurality of training samples, the following loss function is used. :
[0058]
[0059] in, The error between each reasoning step of the compressed model to be trained and the reasoning thought chain of the training samples; The error between the final inference result of the compressed model to be trained and the inference answer of the training samples; The error between the tool calls of the compressed model to be trained and the tool call information of the training samples; and All are preset weighting coefficients.
[0060] Interpretive design of a joint loss function, based on inference thought chain loss. Output alignment loss And tool call imitation loss This ensures that the compressed model aligns with the large language model for power business in terms of reasoning thought chains, tool calls, and reasoning answers. For example, The cross-entropy error between each inference step of the compressed model to be trained and the inference thought chain of the training samples can be used. The cross-entropy or KL divergence between the final inference result of the compressed model to be trained and the inference answer of the training samples can be used. Multi-class classification cross-entropy can be used between the tool call information of the compressed model to be trained and the tool call information of the training samples. and By balancing the weights of different losses, joint optimization enables the compressed model to maintain the accuracy of the inference chain while aligning the corresponding inference answers and tool calls.
[0061] In one possible implementation, see Figure 2 In practical applications, the specific reasoning process of the compressed model includes: accepting input questions from the power business, completing the decision through a cyclical process of progressively generating reasoning steps, calling tools, and fusing reasoning. The compressed model outputs a reasoning result at each step, generates reasoning steps based on the reasoning results using a thought chain generation head, and feeds these steps back to the lightweight model; it also uses a tool calling head to determine whether a tool needs to be called based on the reasoning results, and if so, calls the tool and feeds back the call result to the lightweight model. This process is repeated until the final reasoning result is given as the reasoning answer.
[0062] In the lightweight model, when performing inference based on the input question, inference steps, and invocation results and outputting the inference results, the inference steps and invocation results can be empty.
[0063] For example, the output of the compressed model can be configured as a reasoning answer and a reasoning thought chain (i.e., the reasoning steps generated by the thought chain generator are combined in chronological order), which can provide explanatory evidence to help users understand and verify their decisions, thereby ensuring the accuracy of the decisions.
[0064] In one possible implementation, based on a large language model compression method for power business in a specific application scenario, the resulting compressed model reduces inference latency by over 50% and computational power requirements by over 80%. Simultaneously, compared to compressed models obtained using current conventional compression methods, decision accuracy is improved by over 30%, and errors are reduced by 25%. Furthermore, generalization capability is improved by 40%, enabling rapid adaptation to new scenarios such as renewable energy grid integration. When integrating with existing power systems, the engineering cycle is shortened by 50%, demonstrating both technological advancement and engineering practicality.
[0065] The following are embodiments of the apparatus of the present invention, which can be used to execute embodiments of the method of the present invention. For details not disclosed in the apparatus embodiments, please refer to the embodiments of the method of the present invention.
[0066] See Figure 3 In another embodiment of the present invention, a large language model compression system for power business is provided, which can be used to implement the above-mentioned large language model compression method for power business. Specifically, the large language model compression system for power business includes a sample acquisition module, a model construction module and a model compression module.
[0067] The sample acquisition module is used to acquire training samples based on a large language model for power business. These training samples include input questions, reasoning thought chains, tool call information, and reasoning answers. The model building module is used to construct a lightweight model and then set parallel thought chain generation and tool call headers after the lightweight model to obtain a compressed model to be trained. The lightweight model is used to perform reasoning based on the input question, reasoning steps, and call results, and output the reasoning results. The thought chain generation head is used to generate reasoning steps based on the reasoning results and feed them back to the lightweight model. The tool call header is used to determine whether a tool needs to be called based on the reasoning results, and when a tool needs to be called, to call the tool and feed the call results back to the lightweight model. The model compression module is used to train the compressed model to be trained based on the training samples to obtain the compressed model.
[0068] In one possible implementation, when obtaining training samples based on a large language model for power business, several training samples are obtained for the same input problem.
[0069] In one possible implementation, the construction of the lightweight model includes: constructing a lightweight model using a Transformer architecture or a long short-term memory network.
[0070] In one possible implementation, when acquiring training samples based on a large language model for power business, the tool invocation action in the tool invocation information is encoded in the form of a token or structured instruction; the tool invocation header invoking the tool when it is needed includes: the tool invocation header generating an invocation instruction encoded in the form of a token or structured instruction based on the inference result and sending it to the invocation interface; wherein, the invocation instruction is used to trigger the invocation interface to invoke the tool and provide feedback on the invocation result.
[0071] In one possible implementation, when training the compressed model to be trained based on the plurality of training samples, the following loss function is used. :
[0072]
[0073] in, The error between each reasoning step of the compressed model to be trained and the reasoning thought chain of the training samples; The error between the final inference result of the compressed model to be trained and the inference answer of the training samples; The error between the tool calls of the compressed model to be trained and the tool call information of the training samples; and All are preset weighting coefficients.
[0074] All relevant content of each step involved in the aforementioned embodiments of the large language model compression method for power business can be referenced to the functional description of the corresponding functional module of the large language model compression system for power business in the embodiments of the present invention, and will not be repeated here.
[0075] The module division in this embodiment of the invention is illustrative and represents only one logical functional division. In actual implementation, other division methods may be used. Furthermore, the functional modules in the various embodiments of the invention can be integrated into a single processor, exist as separate physical entities, or be integrated into a single module. The integrated modules described above can be implemented in hardware or as software functional modules.
[0076] In another embodiment of the present invention, a computer device is provided, comprising a processor and a memory. The memory stores a computer program, which includes program instructions. The processor executes the program instructions stored in the computer storage medium. The processor may be a Central Processing Unit (CPU), or other general-purpose processors, digital signal processors (DSPs), application-specific integrated circuits (ASICs), field-programmable gate arrays (FPGAs), or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. It is the computing and control core of the terminal, suitable for implementing one or more instructions, specifically suitable for loading and executing one or more instructions from the computer storage medium to achieve a corresponding method flow or corresponding function. The processor described in this embodiment of the present invention can be used for the operation of a large language model compression method for power business.
[0077] In another embodiment of the present invention, a storage medium is provided, specifically a computer-readable storage medium (Memory), which is a memory device in a computer device used to store programs and data. It is understood that the computer-readable storage medium here can include both the built-in storage medium in the computer device and extended storage media supported by the computer device. The computer-readable storage medium provides storage space that stores the terminal's operating system. Furthermore, the storage space also stores one or more instructions suitable for loading and execution by a processor. These instructions can be one or more computer programs (including program code). It should be noted that the computer-readable storage medium here can be high-speed RAM or non-volatile memory, such as at least one disk storage device. The processor can load and execute one or more instructions stored in the computer-readable storage medium to implement the corresponding steps of the large language model compression method for power business in the above embodiments.
[0078] Those skilled in the art will understand that embodiments of the present invention can be provided as methods, systems, or computer program products. Therefore, the present invention can take the form of a completely hardware embodiment, a completely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present invention can take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) containing computer-usable program code.
[0079] This invention is described with reference to flowchart illustrations and / or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each block of the flowchart illustrations and / or block diagrams, and combinations of blocks in the flowchart illustrations and / or block diagrams, can be implemented by computer program instructions. These computer program instructions can be provided to a processor of a general-purpose computer, special-purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, generate instructions for implementing the flowchart illustrations and / or block diagrams. Figure 1 One or more processes and / or boxes Figure 1 A device that provides the functions specified in one or more boxes.
[0080] These computer program instructions may also be stored in a computer-readable storage medium that can direct a computer or other programmable data processing device to function in a particular manner, such that the instructions stored in the computer-readable storage medium produce an article of manufacture including instruction means, which are implemented in a process Figure 1 One or more processes and / or boxes Figure 1 The function specified in one or more boxes.
[0081] These computer program instructions may also be loaded onto a computer or other programmable data processing equipment to cause a series of operational steps to be performed on the computer or other programmable equipment to produce a computer-implemented process, thereby providing instructions that execute on the computer or other programmable equipment for implementing the process. Figure 1 One or more processes and / or boxes Figure 1 The steps of the function specified in one or more boxes.
[0082] Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention and not to limit it. Although the present invention has been described in detail with reference to the above embodiments, those skilled in the art should understand that modifications or equivalent substitutions can still be made to the specific implementation of the present invention. Any modifications or equivalent substitutions that do not depart from the spirit and scope of the present invention should be covered within the scope of protection of the claims of the present invention.
Claims
1. A method for compressing large language models for power business, characterized in that, include: Obtain training samples based on a large language model for power business; wherein, the training samples include input question text, reasoning thought chain, tool call information, and reasoning answer; A lightweight model is constructed, and parallel thought chain generation and tool invocation heads are set after the lightweight model to obtain a compressed model to be trained. The lightweight model is used to perform inference based on the input question, inference steps, and invocation results, and output the inference results. The thought chain generation head is used to generate intermediate inference step texts for a large language model oriented towards power business based on the inference results and feed them back to the lightweight model. The tool invocation head is used to determine whether a tool needs to be invoked based on the inference results, and when a tool needs to be invoked, invoke the tool and feed the invocation results back to the lightweight model. The tools include a power grid simulation module and an equipment status query API; the invocation results include simulation result values and equipment status text. The compressed model is obtained by training the training samples.
2. The large language model compression method for power business according to claim 1, characterized in that, When obtaining training samples based on a large language model for power business, several training samples are obtained for the same input problem.
3. The large language model compression method for power business according to claim 1, characterized in that, The construction of the lightweight model includes: Lightweight models are built using the Transformer architecture or Long Short-Term Memory network.
4. The large language model compression method for power business according to claim 1, characterized in that, When acquiring training samples based on a large language model for power business, the tool invocation actions in the tool invocation information are encoded in the form of tokens or structured instructions. The tool invocation header invokes a tool when it is needed, including: the tool invocation header generates an invocation instruction encoded in the form of a token or structured instruction based on the inference result and sends it to the invocation interface; wherein, the invocation instruction is used to trigger the invocation interface to obtain invocation parameters, invoke the tool, and provide feedback on the invocation result.
5. The large language model compression method for power business according to claim 1, characterized in that, When training the compressed model to be trained based on the training samples, the following loss function is used. : in, The error between each reasoning step of the compressed model to be trained and the reasoning thought chain of the training samples; The error between the final inference result of the compressed model to be trained and the inference answer of the training samples; The error between the tool calls of the compressed model to be trained and the tool call information of the training samples; and All are preset weighting coefficients.
6. A large language model compression system for power business, characterized in that, include: The sample acquisition module is used to acquire training samples based on a large language model for power business; wherein, the training samples include input question text, reasoning thought chain, tool call information, and reasoning answer; The model building module is used to construct a lightweight model and set up parallel thought chain generation and tool invocation heads after the lightweight model to obtain a compressed model to be trained. The lightweight model is used to perform inference based on the input question, inference steps, and invocation results, and output the inference results. The thought chain generation head is used to generate intermediate inference step text for a large language model oriented towards power business based on the inference results and feed it back to the lightweight model. The tool invocation head is used to determine whether a tool needs to be invoked based on the inference results, and when a tool needs to be invoked, invoke the tool and feed the invocation results back to the lightweight model. The tools include a power grid simulation module and an equipment status query API; the invocation results include simulation result values and equipment status text. The model compression module is used to train a compressed model based on the training samples to obtain a compressed model.
7. The large language model compression system for power business according to claim 6, characterized in that, When obtaining training samples based on a large language model for power business, several training samples are obtained for the same input problem.
8. The large language model compression system for power business according to claim 6, characterized in that, The construction of the lightweight model includes: Lightweight models are built using the Transformer architecture or Long Short-Term Memory network.
9. The large language model compression system for power business according to claim 6, characterized in that, When acquiring training samples based on a large language model for power business, the tool invocation actions in the tool invocation information are encoded in the form of tokens or structured instructions. The tool invocation header invokes a tool when it is needed, including: the tool invocation header generates an invocation instruction encoded in the form of a token or structured instruction based on the inference result and sends it to the invocation interface; wherein, the invocation instruction is used to trigger the invocation interface to invoke the tool and provide feedback on the invocation result.
10. The large language model compression system for power business according to claim 6, characterized in that, When training the compressed model to be trained based on the training samples, the following loss function is used. : in, The error between each reasoning step of the compressed model to be trained and the reasoning thought chain of the training samples; The error between the final inference result of the compressed model to be trained and the inference answer of the training samples; The error between the tool calls of the compressed model to be trained and the tool call information of the training samples; and All are preset weighting coefficients.
11. A computer device comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, characterized in that, When the processor executes the computer program, it implements the large language model compression method for power services as described in any one of claims 1 to 5.
12. A computer-readable storage medium storing a computer program, characterized in that, When the computer program is executed by the processor, it implements the large language model compression method for power business as described in any one of claims 1 to 5.