Training system and training method
By identifying and freezing unchanged parameters before training an artificial intelligence model, and training only a subset of parameters, the problems of long training time and high resource consumption are solved, achieving resource savings and time reduction.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- REALTEK SEMICON CORP
- Filing Date
- 2024-12-18
- Publication Date
- 2026-06-19
AI Technical Summary
Training artificial intelligence models requires a large amount of training data, resulting in high energy consumption and long training time.
By adjusting the parameters of the AI module using a small amount of training data, parameters that meet the unchanged condition are identified and frozen. Subsequently, these parameters are avoided in a large amount of training data, and only a subset of parameters are trained.
This saved resources and reduced training time while maintaining the effectiveness of the AI model.
Smart Images

Figure CN122242606A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of artificial intelligence, specifically techniques for training artificial intelligence. Background Technology
[0002] Currently, training artificial intelligence models (such as Large Language Models, LLMs) typically requires massive amounts of training data. Even with pre-trained models, training them with new data still involves training the entire AI system. This results in enormous energy consumption and requires a long training time. Summary of the Invention
[0003] In view of the above, some embodiments of the present invention provide a training system, a training method, a non-transitory computer-readable recording medium containing stored programs, and a non-transitory computer program product to improve the problems of the prior art.
[0004] Some embodiments of the present invention provide a training system comprising an artificial intelligence module and a processing module; the artificial intelligence module includes a plurality of parameters; and the processing module is configured to perform: training the artificial intelligence module based on a first training set, and performing a verification procedure to confirm whether the performance of the artificial intelligence module has improved; in response to an improvement in the performance of the artificial intelligence module, confirming whether the parameters include a frozen parameter set, wherein the parameter value changes of each in the frozen parameter set satisfy an unchanged condition; and in response to the parameters including a frozen parameter set, freezing the aforementioned frozen parameter set and training the artificial intelligence module based on a second training set, wherein the number of elements in the second training set is greater than the number of elements in the first training set.
[0005] Some embodiments of the present invention provide a training method for training an artificial intelligence module, executed by a processing module, wherein the artificial intelligence module includes multiple parameters; the training method includes: training the artificial intelligence module based on a first training set; and executing a verification procedure to confirm whether the performance of the artificial intelligence module has improved; in response to the improvement in the performance of the artificial intelligence module, confirming whether the parameters include a frozen parameter set, wherein the parameter value changes of each in the frozen parameter set satisfy an unchanged condition; and in response to the parameters including the frozen parameter set, freezing the aforementioned frozen parameter set and training the artificial intelligence module based on a second training set, wherein the number of elements in the second training set is greater than the number of elements in the first training set.
[0006] Some embodiments of the present invention provide a non-transitory computer-readable medium containing a program and a non-transitory computer program product, which can complete the aforementioned training method after the processing unit loads and executes the program.
[0007] Based on the above, some embodiments of the present invention provide a training system, a training method, a computer-readable recording medium containing stored programs, and a non-transitory computer program product. Before using a large amount of training data (a second training set) to adjust or train an artificial intelligence module, a small amount of training data (a first training set) is used to trigger parameter adjustments of the artificial intelligence module. If some parameters of the artificial intelligence module meet the condition of not changing during training, these parameters are set as elements in the frozen parameter set. Subsequently, when learning extensively based on the second training set, these parameters are directly avoided without adjustment (in other words, only some parameters of the artificial intelligence module are trained) to save resources. When the artificial intelligence module contains a large number of parameters (e.g., the artificial intelligence module contains a large language model), the method disclosed in the foregoing embodiments can save considerable resources and reduce training time. Attached Figure Description
[0008] Figure 1 This is a block diagram of a training system drawn according to some embodiments of the present invention.
[0009] Figure 2A This is a schematic diagram of an artificial intelligence module illustrated according to some embodiments of the present invention.
[0010] Figure 2B This is a schematic diagram illustrating the operation of an artificial intelligence module according to some embodiments of the present invention.
[0011] Figure 3 This is a block diagram of an electronic device system illustrated according to some embodiments of the present invention.
[0012] Figure 4 This is a schematic diagram of a training method flow illustrated according to some embodiments of the present invention.
[0013] Figure 5 This is a schematic diagram of a verification procedure drawn according to some embodiments of the present invention.
[0014] Figure 6 This is a schematic diagram of a verification procedure drawn according to some embodiments of the present invention.
[0015] Figure 7 This is a schematic diagram of a training method flow illustrated according to some embodiments of the present invention. Detailed Implementation
[0016] The foregoing descriptions and other technical contents, features, and effects of this invention will be clearly presented in the following detailed description of embodiments with reference to the accompanying drawings. Any modifications and alterations that do not affect the effects and objectives achieved by this invention should still fall within the scope of the technical contents disclosed in this invention. In all drawings, the same reference numerals will be used to denote the same or similar elements.
[0017] Figure 1 This is a block diagram of a training system illustrated according to some embodiments of the present invention. Please refer to... Figure 1 The training system 100 includes a processing module 101 and an artificial intelligence module 102. The artificial intelligence module 102 contains multiple trainable parameters. The artificial intelligence module 102 may, for example, contain a pre-trained neural network model or an untrained neural network model. After the processing module 101 trains the artificial intelligence module 102 based on a training set, the parameters of the artificial intelligence module 102 are determined. In some embodiments of the present invention, the artificial intelligence module 102 includes a large language model. Such a large language model is, for example, a GPT-3 or T5 model.
[0018] The following is a detailed description, with reference to the accompanying drawings, of the training methods of some embodiments of the present invention and how the modules of the training system 100 work together.
[0019] Figure 4 This is a schematic diagram illustrating a training method flow according to some embodiments of the present invention. Please also refer to... Figure 1 as well as Figure 4 In some embodiments of the present invention, the training method includes steps S401 to S406 executed by the processing module 101. In step S401, the processing module 101 trains the artificial intelligence module 102 based on a first training set. In step S402, the processing module 101 performs a verification procedure on the artificial intelligence module 102 trained on the first training set. The aforementioned verification procedure is mainly used to verify whether the performance of the artificial intelligence module 102 has improved after training on the first training set. The processing module 101 can use different verification procedures to verify the performance of the artificial intelligence module 102.
[0020] In step S403, processing module 101 confirms whether the performance of artificial intelligence module 102 has improved. In step S404, in response to the improved performance of artificial intelligence module 102, processing module 101 selects parameters whose parameter value changes satisfy the condition of no change based on the parameter value changes of each parameter of artificial intelligence module 102 after training on the first training set, and places them into a frozen parameter set. The parameter value change of one of the parameters is the parameter value after training on the first training set minus the parameter value before training. If the frozen parameter set is a non-empty set, the parameter value change of each element in the frozen parameter set satisfies the condition of no change. That is, the frozen parameter set is the set of parameters in the parameters of artificial intelligence module 102 whose parameter value changes satisfy the condition of no change. Processing module 101 determines whether the frozen parameter set is an empty set to confirm whether the parameters of artificial intelligence module 102 contain a frozen parameter set that needs to be frozen. If the frozen parameter set is a non-empty set, then the processing module 101 confirms that the parameters of the artificial intelligence module 102 include the frozen parameter set; if the frozen parameter set is an empty set, then the processing module 101 confirms that the parameters of the artificial intelligence module 102 do not include the frozen parameter set.
[0021] In step S405, in response to the fact that the parameters of the artificial intelligence module 102 include a frozen parameter set, the processing module 101 freezes the frozen parameter set. In step S406, the processing module 101 trains the artificial intelligence module 102 based on a second training set, wherein the number of elements in the second training set is greater than the number of elements in the first training set. The elements in both the first and second training sets are also referred to as training data.
[0022] In some embodiments of the present invention, the second training set has training data of the same nature as the first training set. This training data of the same nature is used to enable the trained artificial intelligence module 102 to handle specific tasks. For example, both the second and first training sets consist of images in the style of oil paintings.
[0023] The aforementioned processing module 101 freezes the parameter set and trains the artificial intelligence module 102 based on the second training set. This means that processing module 101 fixes the parameter value of each parameter in the frozen parameter set, and then trains the artificial intelligence module 102 based on the second training set. The following uses... Figure 2A as well as Figure 2B This explains how to freeze the parameter set and train the AI module 102 based on the second training set.
[0024] Figure 2A This is a schematic diagram of an artificial intelligence module illustrated according to some embodiments of the present invention. Figure 2B This is a schematic diagram illustrating the operation of an artificial intelligence module according to some embodiments of the present invention. Please also refer to... Figure 1 , Figure 2A , Figure 2B as well as Figure 4 .exist Figure 2A as well as Figure 2B In the illustrated example, the artificial intelligence module 102 includes a neural network, the architecture of which is shown in neural network 200. Neural network 200 includes input neurons 201, 203, and 205 and output neurons 202, 204, and 206. Input neurons 201, 203, and 205 are configured to receive input data, and the input values obtained by input neurons 201, 203, and 205 are x0, x1, and x2, respectively. The output values of output neurons 202, 204, and 206 are y0, y1, and y2, respectively. Input neurons 201, 203, and 205 are fully-connected to output neurons 202, 204, and 206 via multiple connection paths. All connection paths between input neurons 201, 203, and 205 and output neurons 202, 204, and 206 include a weight. Taking the connection path between input neurons 201, 203, and 205 and output neuron 202 as an example, the weight of the connection path between input neuron 201 and output neuron 202 is w. 00 The weight of the connection path between input neuron 203 and output neuron 202 is w. 10 The weight of the connection path between input neuron 205 and output neuron 202 is w. 20 And so on. Similarly, the weight of the connection path between input neuron 201 and output neuron 204 is w. 01 The connection path between output neuron 202 and input neuron 201 includes weights w. 00 .like Figure 2A As illustrated, output neuron 202 has a partial weight b0, a weighted input z0, and an activation function α. The weighted input z0 consists of the input values of multiple input neurons 201, 203, and 205 based on their respective weights w. 00 w 10 w 20 The weighted sum with the partial weight b0 is shown in the following program (Eq1):
[0025] z0 = w 00 x0+w 10 x1+w 20 x2+b0……(Eq1).
[0026] The output value y0 of output neuron 202 is α(z0), meaning y0 = α(z0). The weighted inputs z1 and z2 of output neurons 204 and 206, and their outputs, follow the same pattern. The activation function α is pre-set according to requirements (e.g., a sigmoid function or a ReLU function). In this example, for ease of explanation, the activation function α is set to the identity function, meaning the output value y0 of output neuron 202 is the weighted input z0. In this embodiment, the aforementioned weights w... 00 w 01 w 02 w 10 w 11 w 12 w 20 w 21 w 22 The partial weights b0, b1, and b2 are the parameters of the artificial intelligence module 102. After each training iteration, the weights w... 00 w 01 w 02 w 10 w 11 w 12 w 20 w 21 w 22 The values of the partial weights b0, b1, and b2 will be updated.
[0027] Please see Figure 2B The operation of neural network 200 is as follows Figure 2B As illustrated, the input values x0, x1, and x2 of input neurons 201, 203, and 205 are weighted by the weights of the connection paths, and then summed with the corresponding partial weights b0, b1, and b2 via addition modules 207, 208, and 209 to obtain weighted inputs z0, z1, and z2. The weighted inputs z0, z1, and z2 are then passed through activation function module 210 to obtain the function value of activation function α, resulting in the corresponding output values y0, y1, and y2. It is worth noting that the aforementioned addition modules 207, 208, and 209, as well as activation function module 210, can be implemented in hardware or software; this invention is not limited to either.
[0028] In this example, the second training set is shown in Table (I) below: where X is the input value, Y is the expected output value, number 0 corresponds to input value x0 and output value y0, number 1 corresponds to input value x1 and output value y1, and number 2 corresponds to input value x2 and output value y2. The parameter values of the neural network 200 after training on the first training set are shown in Table (II) below, where, for simplicity, only the weight w corresponding to output y0 is listed.00 The value of w (0.2 in this example) 10 The value of w (0.5 in this example) 20 The value of b0 (0.5 in this embodiment) and the value of the partial weight b0 (0.3 in this embodiment).
[0029] Number i <![CDATA[X i ]]> <![CDATA[Y i ]]> 0 5 6 1 3 2 2 7 6
[0030] Table (1)
[0031] parameter <![CDATA[w 00 ]]> <![CDATA[w 10 ]]> <![CDATA[w 11 ]]> <![CDATA[b0]]> 0 0.2 0.5 0.5 0.3
[0032] Table (II)
[0033] In the aforementioned step S404, the processing module 101 records all weights w 00 w 01 w 02 w 10 w 11 w 12 w 20 w 21 w 22 And the values of the partial weights b0, b1, and b2. After training on the first training set, if the processing module 101 confirms that the weights w of the artificial intelligence module 102 after training on the first training set are... 00 The numerical change satisfies the condition of no change, and the weight w is selected. 00 The parameters are added to the frozen parameter set. Then, in step S405, processing module 101 confirms that the parameters of artificial intelligence module 102 include the frozen parameter set. In step S404, processing module 101 also records the weight w. 00 (0.2). In step S405, the processing module 101 freezes the weight w. 00 (0.2), and in step S406, the artificial intelligence module 102 is trained with the second training set.
[0034] If processing module 101 inputs neural network 200 using Table 1, and trains AI module 102 using 1 / 2 squared error (i.e., a coefficient of 1 / 2 before the squared error) as the loss function, then the loss function C = C0 + C1 + C2, where...
[0035]
[0036]
[0037]
[0038] In the current w 10 (In this embodiment, it is 0.5), w 20(In this embodiment, the value is 0.5) and b0 (in this embodiment, the value of C0 is 0.3).
[0039] Due to weight w 00 (Currently 0.2) The weight w was frozen after training on the first training set. During training on the second training set, the weight w... 00 The value (currently 0.2) does not need to be updated; it can be set directly during training. Without updating the weight w 00 Processing module 101 only needs to calculate the loss function C with respect to other weights w. 01 w 02 w 10 w 11 w 12 w 20 w 21 w 22 The partial derivatives of the partial weights b0, b1, and b2 are used to update the weight w. 01 w 02 w 10 w 11 w 12 w 20 w 21 w 22 The values of the partial weights b0, b1, and b2. The following uses weight w... 10 w 20 And take the update of the partial weight b0 as an example. Because the weight w 10 w 20 And since the partial weight b0 is only a parameter of the function C0, it is only necessary to calculate the loss function C with respect to the weight w. 10 w 20 And when calculating the partial derivative of the partial weight b0, simply calculate C0 with respect to the weight w. 10 w 20 And the partial derivative of the partial weight b0. In this embodiment, it is assumed that the current output value corresponding to the input values x0 = 5, x1 = 3, x2 = 7 is y0 = 6.3. Through symbolic computation, the processing module 101 can obtain
[0040]
[0041]
[0042]
[0043]
[0044] In the first training epoch of the second training set, due to the weight w 00 The value is fixed at 0.2, and can be set. Equal to the updated weight w 00 The value is still the weight w 00 The original value. Based on the gradient descent algorithm with a learning rate of 0.01; weight w 10 It will be updated to 0.5 - 0.01·(0.9) = 0.491, with weight W. 20 The value will be updated to 0.5 - 0.01·(2.1) = 0.479, and the partial weight b0 will be updated to 0.3 - 0.01·(0.3) = 0.297. It is worth noting that substituting the updated parameters into the loss function c (equation Eq2) reveals that the new value of the loss function c is smaller than the previous value, consistent with a downward trend. Subsequent training rounds will follow the same pattern.
[0045] It is worth noting that the backpropagation algorithm of the aforementioned single-layer neural network first calculates the loss function C of the single output of the neuron against the weights w through symbolic computation. 01 w 02 w 10 w 11 w 12 w 20 w 21 w 22 The partial derivatives of the partial weights b0, b1, and b2 are expressed in the form of equation (Eq3), and then the weights w are substituted into the equation. 01 w 02 w 10 w 11 w 12 w 20 w 21 w 22 And the values of the partial weights b0, b1, and b2 are used to obtain the loss function C on the weights w. 01 w 02 w 10 w 11 w 12 w 20 w 21 w 22 And the numerical values of the partial derivatives of the partial weights b0, b1, and b2. Processing module 101 can also utilize the backpropagation algorithm of a multi-layer neural network to obtain the loss function C vector paired with the weights w. 01 w 02 w 10 w 11 w 12 w 20 w 21 w 22And the numerical values of the partial derivatives of the partial weights b0, b1, and b2.
[0046] It is also worth noting that, although in Figure 2A , Figure 2B In the example, for ease of explanation, we use neural network 200 and a second training set containing one training data set. However... Figure 2A , Figure 2B The example of freezing the parameter set and training the AI module based on the second training set can be applied to any form of AI module containing trainable parameters and training sets of any size.
[0047] In some embodiments of the present invention, the artificial intelligence module 102 includes a neural network built by a module provided by PyTorch, and the processing module 101 sets "param.requires_grad = False" for the element param of the frozen parameter set and skips the "optimizer.step(), optimizer.zero_grad()" instructions to stop updating the elements in the frozen parameter set (i.e., freeze the frozen parameter set).
[0048] In the foregoing embodiments, before adjusting or training the AI module 102 using a large amount of training data (the second training set), a small amount of training data (the first training set) is used to trigger parameter adjustments in the AI module 102. If some parameters of the AI module 102 meet the condition of remaining unchanged during training, these parameters are set as elements in the frozen parameter set. Then, during large-scale learning based on the second training set, these parameters are directly avoided without adjustment (in other words, only some parameters of the AI module 102 are trained, or, if the AI module 102 contains a neural network, only a local portion of the neural network is trained) to save resources. When the AI module 102 contains a large number of parameters (e.g., the AI module 102 contains a large language model), the method disclosed in the foregoing embodiments can save considerable resources and reduce training time. Furthermore, when the AI module 102 contains a pre-trained large language model, the aforementioned training method does not affect the versatility of the large language model.
[0049] Figure 5 This is a schematic diagram of a verification procedure illustrated according to some embodiments of the present invention. Please also refer to... Figure 1 , Figure 4 as well as Figure 5 .exist Figure 5In the illustrated embodiment, processing module 101 transmits the AI module 102, trained on a first training set, to an external verification system. The external verification system returns an indication signal regarding the verification result of the AI module 102. The verification system generates the aforementioned indication signal based on the expert verification result. In this embodiment, the verification procedure includes steps S501 to S504. In step S501, processing module 101 receives the external indication signal regarding the verification result of the AI module 102. In step S502, processing module 101 confirms whether the indication signal indicates an improvement in the performance of the AI module 102. If yes, it executes step S503; otherwise, it executes step S504. In step S503, in response to the indication signal indicating an improvement in the performance of the AI module 102, processing module 101 confirms the performance improvement of the AI module 102. In step S504, in response to an indication signal indicating that the performance of the artificial intelligence module 102 has not improved, the processing module 101 confirms that the performance of the artificial intelligence module 102 has not improved.
[0050] In some embodiments of the present invention, the artificial intelligence module 102 includes a pre-trained T5-small model. Before adjusting the pre-trained T5-small model with a second training set, the processing module 101 first trains the artificial intelligence module 102 with a first training set containing fewer elements in step S401. The processing module 101 then transmits the artificial intelligence module 102 trained with the first training set to an external verification system. The external verification system integrates expert opinions on the performance of the artificial intelligence module 102 trained with the first training set and sends back an indication signal regarding the verification results of the artificial intelligence module 102.
[0051] Figure 6 This is a schematic diagram of a verification procedure illustrated according to some embodiments of the present invention. Please also refer to... Figure 1 , Figure 4 as well as Figure 6 .exist Figure 6In the illustrated embodiment, the verification procedure includes steps S601 to S604. In step S601, processing module 101 verifies AI module 102 based on a verification set to obtain an accuracy rate, wherein each element of the verification set has a correct answer label. Processing module 101 inputs each element of the verification set into AI module 102 and compares the output of AI module 102 with the corresponding correct answer label to obtain the ratio of correct answers output by AI module 102. Processing module 101 then uses the ratio of correct answers output by AI module 102 as the accuracy rate. In step S602, processing module 101 determines whether the aforementioned accuracy rate is greater than a predetermined accuracy rate. If yes, step S603 is executed; otherwise, step S604 is executed. In step S603, in response to the accuracy rate being greater than the predetermined accuracy rate, processing module 101 confirms that the performance of AI module 102 has improved. In step S604, in response to the accuracy rate not being greater than the predetermined accuracy rate, the processing module 101 confirms that the performance of the artificial intelligence module 102 has not improved.
[0052] Figure 7 This is a schematic diagram illustrating a training method flow according to some embodiments of the present invention. Please also refer to... Figure 1 , Figure 4 as well as Figure 7 .exist Figure 7 In the illustrated embodiment, step S404 includes steps S701 to S705 to obtain a frozen parameter set. In step S701, processing module 101 calculates the change in the parameter value of a current parameter from the parameters of artificial intelligence module 102. In step S702, processing module 101 determines whether the change in the parameter value of the current parameter satisfies the condition of no change; if yes, it executes step S703; otherwise, it executes step S704. In step S703, in response to the condition that the change in the parameter value of the current parameter satisfies the condition of no change, processing module 101 sets the current parameter as an element of the frozen parameter set. In step S704, processing module 101 determines whether there are any unselected parameters among the parameters of artificial intelligence module 102; if yes, it executes step S705; otherwise, it terminates the current program. In step S705, in response to the presence of unselected parameters among the parameters of artificial intelligence module 102, the unselected parameter is selected as the current parameter, and the process returns to step S701.
[0053] In some embodiments of the present invention, the aforementioned unchanged condition is that the change in parameter value is less than or equal to a preset change value. In some embodiments of the present invention, the aforementioned unchanged condition is that the change in parameter value is 0, that is, the parameter value has not changed.
[0054] Please refer to the following: Figure 4In some embodiments of the present invention, the training method further includes step S407. In this embodiment, when the processing module 101 confirms in step S403 that the performance of the artificial intelligence module 102 has not improved, the processing module 101 executes step S407. In step S407, in response to the lack of improvement in the performance of the artificial intelligence module 102, the processing module 101 uses a new training set as the first training set and returns to step S401.
[0055] In some embodiments of the present invention, the number of elements in the new training set is greater than the number of elements in the original first training set.
[0056] Figure 3 These are block diagrams of electronic device systems illustrated according to some embodiments of the present invention. Figure 3 As shown, at the hardware level, electronic device 300 includes a processing unit 301, internal memory 302, and non-volatile memory 303. Internal memory 302 is, for example, random-access memory (RAM). Non-volatile memory 303 is, for example, at least one magnetic disk storage device. Of course, electronic device 300 may also include hardware required for other functions.
[0057] Internal memory 302 and non-volatile memory 303 are used to store programs, which may include program code and computer operation instructions. Internal memory 302 and non-volatile memory 303 provide instructions and data to processing unit 301. Processing unit 301 reads the corresponding computer program from non-volatile memory 303 into internal memory 302 and then runs it, forming training system 100 at the logical level.
[0058] The processing unit 301 may be an integrated circuit chip with signal processing capabilities. In implementation, the methods and steps disclosed in the foregoing embodiments can be performed through hardware integrated logic circuits or software instructions within the processing unit 301. The processing unit 301 may be a general-purpose processor, including a central processing unit, a digital signal processor, an application-specific integrated circuit (ASIC), a field-programmable gate array (FPGA), or other programmable logic devices, capable of implementing or executing the methods and steps disclosed in the foregoing embodiments.
[0059] This specification also provides a computer-readable storage medium that stores at least one instruction. When executed by the processing unit 301 of the electronic device 300, the at least one instruction enables the processing unit 301 of the electronic device 300 to perform the methods and steps disclosed in the foregoing embodiments.
[0060] Examples of computer storage media include, but are not limited to, phase-change memory (PRAM), static random-access memory (SRAM), dynamic random-access memory (DRAM), other types of random-access memory (RAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory or other internal memory technologies, CD-ROM, digital versatile disc (DVD) or other optical storage, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other non-transferable media that can be used to store information accessible by a computing device. As defined herein, computer-readable media does not include transient media, such as modulated data signals and carrier waves.
[0061] The training system, training method, computer-readable recording medium containing stored programs, and non-transitory computer program products provided in the foregoing embodiments, before adjusting or training the artificial intelligence module 102 with a large amount of training data (second training set), first trigger parameter adjustments of the artificial intelligence module 102 with a small amount of training data (first training set). If some parameters of the artificial intelligence module 102 meet the condition of not changing during training, these parameters are set as elements in the frozen parameter set. Subsequently, when learning extensively based on the second training set, these parameters are directly avoided without adjustment to save resources. When the artificial intelligence module 102 contains a large number of parameters (e.g., the artificial intelligence module 102 contains a large language model), the method disclosed in the foregoing embodiments can save considerable resources and reduce training time. Furthermore, when the artificial intelligence module 102 contains a pre-trained large language model, the aforementioned training method does not affect the versatility of the large language model.
[0062] Although the present invention has been disclosed above by way of embodiments, it is not intended to limit the present invention. Any person skilled in the art may make some modifications and refinements without departing from the spirit and scope of the present invention. Therefore, the scope of protection of the present invention shall be determined by the appended claims.
[0063] [Symbol Explanation]
[0064] 100: Training System
[0065] 101: Processing Module
[0066] 102: Artificial Intelligence Module
[0067] 200: Neural Networks
[0068] 201, 203, 205: Input neurons
[0069] 202, 204, 206: Output neurons
[0070] 207, 208, 209: Addition Module
[0071] 210: Activation Function Module
[0072] x0, x1, x2: Input values
[0073] y0, y1, y2: Output values
[0074] w 00 ,w 01 ,w 02 ,w 10 ,w 11 ,w 12 ,w 20 ,w 21 ,w 22 Weight
[0075] b0, b1, b2: Partial weights
[0076] z0, z1, z2: Weighted input
[0077] α: Activation function
[0078] 300: Electronic Equipment
[0079] 301: Processing Unit
[0080] 302: Internal Memory
[0081] 303: Non-volatile memory
[0082] S401~S407, S501~S504, S601~S604, S701~S705: Steps
Claims
1. A training system comprising: An artificial intelligence module, containing multiple parameters; and A processing module, configured to execute: (a) The artificial intelligence module is trained based on a first training set; And to perform a verification procedure to confirm whether the performance of the artificial intelligence module has improved; (b) In response to the performance improvement of the artificial intelligence module, confirm whether the parameters include a frozen parameter set, wherein a change in the value of a parameter in each of the frozen parameter set satisfies an unchanged condition; as well as (c) In response to the parameters including the frozen parameter set, freeze the frozen parameter set and train the artificial intelligence module based on a second training set, wherein the number of elements in the second training set is greater than the number of elements in the first training set.
2. The training system of claim 1, wherein the processing module is configured to perform: (b1) In response to the lack of improvement in the performance of the artificial intelligence module, a new training set is used as the first training set and the process returns to step (a).
3. The training system according to claim 1, wherein the verification procedure comprises: (a1) Receive an external instruction signal; (a2) In response to the indication signal indicating an improvement in the performance of the artificial intelligence module, confirm the improvement in the performance of the artificial intelligence module; and (a3) In response to the indication signal indicating that the performance of the artificial intelligence module has not improved, confirm that the performance of the artificial intelligence module has not improved.
4. The training system according to claim 1, wherein the verification procedure comprises: (a1) Validate the AI module based on a validation set to obtain an accuracy rate; (a2) In response to the accuracy rate being greater than a predetermined accuracy rate, confirm the performance improvement of the artificial intelligence module; and (a3) In response to the accuracy rate not being greater than the predetermined accuracy rate, it is confirmed that the performance of the artificial intelligence module has not improved.
5. The training system according to claim 1, wherein the artificial intelligence module comprises a large language model.
6. The training system according to claim 1, wherein the unchanged condition is that the change in the parameter value is less than or equal to a preset change value.
7. The training system according to claim 1, wherein the unchanged condition is that the parameter value changes to 0.
8. The training system according to claim 1, wherein step (b) comprises: (b1) For one of these parameters, determine whether the change in the value of the current parameter satisfies the condition of no change; (b2) In response to the change in the value of the current parameter satisfying the unchanged condition, the current parameter is set as an element of the frozen parameter set; and (b3) In response to the fact that one of the parameters is not selected, select the unselected parameter as the current parameter and return to step (b1).
9. A training method for training an artificial intelligence module, executed by a processing module; the artificial intelligence module includes multiple parameters; the training method includes: (a) Training the AI module based on a first training set; and performing a verification procedure to confirm whether the performance of the AI module has improved; (b) In response to the performance improvement of the artificial intelligence module, confirm whether the parameters include a frozen parameter set, wherein a change in the value of a parameter in each of the frozen parameter set satisfies an unchanged condition; as well as (c) In response to the parameters including the frozen parameter set, freeze the frozen parameter set and train the artificial intelligence module based on a second training set, wherein the number of elements in the second training set is greater than the number of elements in the first training set.
10. The training method according to claim 9, wherein step (b) comprises: (b1) For one of these parameters, determine whether the change in the value of the current parameter satisfies the condition of no change; (b2) In response to the change in the value of the current parameter satisfying the unchanged condition, the current parameter is set as an element of the frozen parameter set; and (b3) In response to the fact that one of the parameters is not selected, select the unselected parameter as the current parameter and return to step (b1).