[0020] The embodiments of the present application are described in detail below. Examples of the embodiments are shown in the accompanying drawings, wherein the same or similar reference numerals indicate the same or similar elements or elements with the same or similar functions. The embodiments described below with reference to the accompanying drawings are exemplary, and are intended to explain the application, but should not be understood as a limitation to the application.
[0021] The information interaction method and system of the embodiments of the present application are described below with reference to the drawings.
[0022] figure 1 It is a flowchart of an information exchange method according to an embodiment of the present application. This embodiment is described from the side of the second device, where the second device may be a slave.
[0023] Such as figure 1 As shown, the information interaction method includes:
[0024] S101: Determine whether an exit command is received, if not, read current weight information from the first device, and calculate gradient information according to the read current weight information.
[0025] Specifically, the second device, such as the slave, determines whether the exit command is received, where the exit command can be from the first device, such as the master, or from other external devices. If it is not received, the current weight information is read from the first device. And calculate the gradient information according to the read current weight information, if received, stop working.
[0026] S102: Return gradient information greater than or equal to a preset gradient threshold to the first device, so that the first device recalculates weight information according to the returned gradient information, and uses the weight information greater than or equal to the preset weight threshold as current weight information, and repeats the execution The above operations until the exit command is received.
[0027] After the second device calculates the gradient information, it does not send all the calculated gradient information to the first device. Instead, it sends the gradient information to the first device based on the gradient threshold mechanism, that is, only sends the gradient information greater than or equal to the preset value to the first device. The gradient information of the gradient threshold. Specifically, gradient information greater than or equal to a preset gradient threshold may be directly returned to the first device, or gradient information less than the preset gradient threshold may be set to zero, and non-zero gradient information may be returned to the first device. This method of returning only part of the gradient information to the first device can effectively reduce the amount of communication between the first device and the second device, thereby improving communication efficiency.
[0028] The first device recalculates the weight information after receiving the gradient information greater than or equal to the preset gradient threshold. However, in the embodiment of the present application, the first device does not provide all the recalculated weight information for the second device. Instead, it provides weight information for the second device based on the weight threshold mechanism, that is, only provides weight information greater than or equal to the preset weight threshold for the second device. Specifically, weight information greater than or equal to a preset weight threshold may be directly provided to the second device, or weight information less than the preset weight threshold may be set to zero, and non-zero weight information may be provided to the second device. This method of providing only part of the weight information for the second device can effectively reduce the amount of communication between the first device and the second device, thereby improving communication efficiency, and effectively reducing the memory consumption of the second device.
[0029] It should be noted that the values of the preset gradient threshold and the preset weight threshold are very important, because the setting of the preset gradient threshold and the preset weight threshold is not only to reduce the amount of communication between the first device and the second device, It is also necessary to ensure that the approximate optimal solution is obtained through the above iteration as much as possible, that is, the quality of the obtained training model needs to be guaranteed.
[0030] Wherein, the above-mentioned gradient information and weight information may both be vectors containing at least one element. The aforementioned preset gradient threshold may be the average value of the elements contained in the corresponding gradient information divided by N, where N is 4-6, preferably 5; the aforementioned preset weight threshold is the average value of the elements contained in the corresponding weight information divided by M, M is 11-13, preferably 12. The above two thresholds are obtained based on continuous adjustment of experimental data, and of course they are also related to the corresponding gradient information and weight information.
[0031] The above threshold setting can not only effectively reduce the communication volume between the first device and the second device, but also ensure that an approximate optimal solution is obtained, that is, the quality of the obtained training model is guaranteed.
[0032] It can be seen that in this embodiment, as the iteration progresses, the communication volume becomes less and less, which can effectively reduce the usage of network resources and the consumption of cluster resources; the iteration time of each round becomes shorter and shorter, and the convergence speed Faster and faster, you can quickly get a trained model to provide users with services; at the same time, due to the reduction in communication volume, the memory consumption of the slave can be greatly reduced.
[0033] It should be noted that the above-mentioned information interaction method can be applied to many fields, and is especially suitable for generating various training models in the field of machine learning. For example, it can be applied to fields such as handwritten letter recognition, face recognition, or fingerprint recognition. figure 1 The implementation process shown can quickly generate the corresponding recognition model to complete the recognition of handwritten letters, human faces or fingerprints. Since the information interaction method provided by the embodiment of the present invention is implemented on the basis of the principle of machine learning, Therefore, users are required to better grasp the knowledge of machine learning, which increases the difficulty of users. At the same time, the versatility is slightly less, but it can greatly reduce the communication volume, improve the communication efficiency, and reduce the memory consumption of the second device such as slave.
[0034] In the above information interaction method, by determining whether an exit command from the first device is received, if not, the current weight information is read from the first device, and gradient information is calculated according to the read current weight information; and A device returns gradient information greater than or equal to the preset gradient threshold, so that the first device recalculates the weight information according to the returned gradient information, and uses the weight information greater than or equal to the preset weight threshold as the current weight information, and repeats the above operations until Receiving the exit command can greatly reduce the amount of communication between the first device and the second device, thereby reducing resource consumption, improving communication efficiency, and greatly reducing the consumption of memory by the slave.
[0035] figure 2 It is a flowchart of an information exchange method in another embodiment of this application. This embodiment is described from the side of the first device, where the first device may be the master.
[0036] Such as figure 2 As shown, the information interaction method includes:
[0037] S201: Determine whether the exit condition is met, and if not, provide current weight information for at least one second device, and receive gradient information returned by at least one second device that is greater than or equal to a preset gradient threshold.
[0038] Specifically, the first device, such as the master, judges whether it meets the exit condition, and if not, provides current weight information for at least one second device, such as slave, and at least one second device calculates gradient information based on the read current weight information, And return gradient information greater than or equal to the preset gradient threshold to the first device. This method of returning only partial gradient information to the first device can effectively reduce the communication volume between the first device and the second device, thereby improving communication efficiency .
[0039] In addition, if the first device confirms that it meets the exit condition, it sends an exit command to at least one second device to stop the at least one second device from working.
[0040] S202: Recalculate weight information according to the returned gradient information, and use weight information greater than or equal to a preset weight threshold as current weight information, and repeat the foregoing operations until the exit condition is met.
[0041] In this embodiment, the first device receives the gradient information returned by the second device that is greater than or equal to the preset gradient threshold, recalculates the weight information, and uses the weight information greater than or equal to the preset weight threshold as the current weight information. This method of providing only part of the weight information for the second device can effectively reduce the amount of communication between the first device and the second device, thereby improving communication efficiency, and effectively reducing the memory consumption of the second device.
[0042] Specifically, using the weight information greater than or equal to the preset weight threshold as the current weight information may be: directly using the weight information greater than or equal to the preset weight threshold as the current weight information; or it may be: setting the weight information less than the preset weight threshold Is zero, and uses non-zero weight information as the current weight information.
[0043] It should be noted that the values of the preset gradient threshold and the preset weight threshold are very important, because the setting of the preset gradient threshold and the preset weight threshold is not only to reduce the amount of communication between the first device and the second device, It is also necessary to ensure that the approximate optimal solution is obtained through the above iteration as much as possible, that is, the quality of the obtained training model needs to be guaranteed.
[0044] Wherein, the aforementioned gradient information and weight information can both be vectors containing at least one element; the aforementioned preset gradient threshold can be the average value of the elements contained in the corresponding gradient information divided by N, where N is 4-6, preferably 5; The preset weight threshold is the average value of the elements contained in the corresponding weight information divided by M, where M is 11-13, and the preferred value is 12. The above threshold setting can not only effectively reduce the communication volume between the first device and the second device, but also ensure that an approximate optimal solution is obtained, that is, the quality of the obtained training model is guaranteed. The values of the preset gradient threshold and the preset weight threshold are only examples, and can be dynamically adjusted as needed in practical applications.
[0045] It can be seen that in this embodiment, as the iteration progresses, the communication volume becomes less and less, which can effectively reduce the usage of network resources and the consumption of cluster resources; the iteration time of each round becomes shorter and shorter, and the convergence speed Faster and faster, you can quickly get a trained model to provide users with services; at the same time, due to the reduction in communication volume, the memory consumption of the slave can be greatly reduced.
[0046] It should be noted that the above-mentioned information interaction method can be applied to many fields, and is especially suitable for generating various training models in the field of machine learning. For example, it can be applied to fields such as handwritten letter recognition, face recognition, or fingerprint recognition. figure 2 The implementation process shown can quickly generate the corresponding recognition model to complete the recognition of handwritten letters, human faces or fingerprints. Since the information interaction method provided by the embodiment of the present invention is implemented on the basis of the principle of machine learning, Therefore, users are required to better grasp the knowledge of machine learning, which increases the difficulty of users. At the same time, the versatility is slightly less, but it can greatly reduce the communication volume, improve the communication efficiency, and reduce the memory consumption of the second device such as slave.
[0047] In the above information interaction method, the first device receives the gradient information returned by at least one second device that is greater than or equal to the preset gradient threshold, and the first device provides the at least one second device with weight information greater than or equal to the preset weight threshold, which greatly reduces The communication volume between the first device and the second device is reduced, thereby reducing resource consumption, improving communication efficiency, and greatly reducing the consumption of memory by the slave.
[0048] image 3 It is a schematic diagram of an information interaction process of an embodiment of the present application. This embodiment uses master and slave as examples to describe the interaction process of gradient information and weight information.
[0049] Such as image 3 As shown, the information interaction process includes:
[0050] S301: If the slave does not receive the exit command from the master, it reads the weight information from the master.
[0051] If the slave receives an exit command from the master, it stops working, that is, the interactive process ends.
[0052] S302: The slave calculates gradient information according to the read weight information.
[0053] S303: The slave pushes non-zero gradient information back to the master based on the gradient threshold mechanism.
[0054] Specifically, the slave sets the gradient elements whose absolute value is less than the preset gradient threshold to zero, and only pushes non-zero gradient information to the master, thereby reducing the amount of communication.
[0055] S304: The master judges whether the exit condition is met, and if the exit condition is not met, it provides weights for all slaves, and turns to S305, and if the exit condition is met, sends an exit command to all slaves.
[0056] S305, the master accumulates the gradient information sent back by all slaves.
[0057] S306, the master updates the weights according to the accumulated gradient information, and provides non-zero weights for all slaves based on the weight threshold mechanism, and then turns to S301.
[0058] Specifically, the master sets the weight elements whose absolute value is less than the preset weight threshold to zero, and only pushes non-zero weight information to the master, thereby reducing the amount of communication.
[0059] It can be seen that after multiple rounds of interaction, getting closer and closer to the optimal solution, then the gradient on the slave that is less than the preset gradient threshold can be truncated (truncated to 0, or threshold), so there is no need to send these truncated gradients To the master; when the master updates the weights, the weights that are less than the preset weight threshold are truncated to 0, so that the slave does not need to read these weights that are 0. These steps can speed up the communication speed and reduce the communication volume, while greatly reducing The memory consumption of slave is reduced.
[0060] Figure 4 It is a schematic structural diagram of an information interaction system according to an embodiment of the present application.
[0061] Such as Figure 4 As shown, the information interaction system includes a first device 41 and at least one second device 42, wherein:
[0062] The first device 41 is used to determine whether the exit condition is met, and if not, it provides the current weight information for at least one second device 42, receives the gradient information returned by the at least one second device 42 that is greater than or equal to the preset gradient threshold, and according to the return Recalculate the weight information with the gradient information of the, and use the weight information greater than or equal to the preset weight threshold as the current weight information, and repeat the above operations until the exit condition is met;
[0063] The above-mentioned at least one second device 42 is used to determine whether to receive an exit command. If it is not received, it reads the current weight information from the first device 41, calculates the gradient information according to the read current weight information, and sends it to the first device 41 returns gradient information greater than or equal to the preset gradient threshold, and repeats the above operations until an exit command is received.
[0064] In this embodiment, after at least one second device 42 calculates the gradient information, it does not send all the calculated gradient information to the first device, but sends the gradient information to the first device based on the gradient threshold mechanism. Specifically, at least one second device 42 may be used to: directly return gradient information greater than or equal to a preset gradient threshold to the first device 41; or set gradient information less than the preset gradient threshold to zero, and send it to the first device 41. The device 41 returns non-zero gradient information.
[0065] Similarly, the above-mentioned first device 41 may be used to: directly use weight information greater than or equal to a preset weight threshold as current weight information; or set weight information less than a preset weight threshold to zero, and use non-zero weight information as current weight information. Weight information.
[0066] It should be noted that the values of the preset gradient threshold and the preset weight threshold are very important, because the setting of the preset gradient threshold and the preset weight threshold is not only to reduce the amount of communication between the first device and the second device, It is also necessary to ensure that the approximate optimal solution is obtained through the above iteration as much as possible, that is, the quality of the obtained training model needs to be guaranteed.
[0067] Wherein, the aforementioned gradient information and weight information can both be vectors containing at least one element; the aforementioned preset gradient threshold can be the average value of the elements contained in the corresponding gradient information divided by N, where N is 4-6, preferably 5; The preset weight threshold is the average value of the elements contained in the corresponding weight information divided by M, where M is 11-13, and the preferred value is 12. The above two thresholds are obtained based on continuous adjustment of experimental data, and of course they are also related to the corresponding gradient information and weight information. The values of the preset gradient threshold and the preset weight threshold are only examples, and can be dynamically adjusted as needed in practical applications.
[0068] The foregoing method of returning only part of the gradient information to the first device and only providing part of the weight information to the second device can effectively reduce the communication volume between the first device and the second device, thereby improving communication efficiency.
[0069] In addition, the above-mentioned first device 41 can also be used to: if the exit condition is met, send an exit command to at least one second device 42, and the above-mentioned at least one second device 42 can also be used to: stop working if the exit command is received .
[0070] Among them, the first device may be a server (master), and the second device may be a worker (slave). For the interaction process between the first device and the second device, see figure 1 , figure 2 or image 3 And the corresponding text description, not repeated here.
[0071] It can be seen that in this embodiment, as the iteration progresses, the amount of communication becomes less and less, which can effectively reduce the usage of network resources and the consumption of system resources; the iteration time of each round becomes shorter and shorter, and the convergence speed Faster and faster, you can quickly get a trained model to provide users with services; at the same time, due to the reduction in communication volume, the memory consumption of the slave can be greatly reduced.
[0072] It should be noted that the above-mentioned information interaction system can be applied in many fields, and is especially suitable for generating various training models in the field of machine learning. For example, it can be applied to fields such as handwritten letter recognition, face recognition, or fingerprint recognition. The corresponding recognition model is quickly generated to complete the recognition of handwritten letters, human faces or fingerprints. Since the information interaction method provided by the embodiments of the present invention is implemented on the basis of the principle of machine learning, users are required to better Mastering the knowledge of machine learning increases the user's difficulty in using it. At the same time, the versatility is slightly less, but it can greatly reduce the communication volume, improve the communication efficiency, and reduce the memory consumption of the second device such as the slave.
[0073] In the above information interaction system, the gradient information greater than or equal to the preset gradient threshold is returned to the first device through at least one second device, and the weight information greater than or equal to the preset weight threshold is provided to at least one second device through the first device, which greatly reduces The amount of communication between the first device and the second device reduces resource consumption, improves communication efficiency, and greatly reduces the consumption of memory by the slave.
[0074] In the description of this specification, descriptions with reference to the terms "one embodiment", "some embodiments", "examples", "specific examples", or "some examples" etc. mean specific features described in conjunction with the embodiment or example , The structure, materials, or characteristics are included in at least one embodiment or example of the present application. In this specification, the schematic representations of the above-mentioned terms do not necessarily refer to the same embodiment or example. Moreover, the described specific features, structures, materials or characteristics can be combined in any one or more embodiments or examples in a suitable manner. In addition, those skilled in the art can combine and combine the different embodiments or examples and the characteristics of the different embodiments or examples described in this specification without contradicting each other.
[0075] In addition, the terms "first" and "second" are only used for descriptive purposes, and cannot be understood as indicating or implying relative importance or implicitly indicating the number of indicated technical features. Therefore, the features defined with "first" and "second" may explicitly or implicitly include at least one of the features. In the description of the present application, "a plurality of" means at least two, such as two, three, etc., unless specifically defined otherwise.
[0076] Any process or method description in the flowchart or described in other ways herein can be understood as a module, segment or part of code that includes one or more executable instructions for implementing specific logical functions or steps of the process , And the scope of the preferred embodiments of the present application includes additional implementations, which may not be in the order shown or discussed, including performing functions in a substantially simultaneous manner or in reverse order according to the functions involved. This should It is understood by those skilled in the art to which the embodiments of this application belong.
[0077] The logic and/or steps represented in the flowchart or described in other ways herein, for example, can be considered as a sequenced list of executable instructions for implementing logic functions, and can be embodied in any computer-readable medium, For use by instruction execution systems, devices, or equipment (such as computer-based systems, systems including processors, or other systems that can fetch and execute instructions from instruction execution systems, devices, or equipment), or combine these instruction execution systems, devices Or equipment. For the purposes of this specification, a "computer-readable medium" can be any device that can contain, store, communicate, propagate, or transmit a program for use by an instruction execution system, device, or device or in combination with these instruction execution systems, devices, or devices. More specific examples (non-exhaustive list) of computer readable media include the following: electrical connections (electronic devices) with one or more wiring, portable computer disk cases (magnetic devices), random access memory (RAM), Read only memory (ROM), erasable and editable read only memory (EPROM or flash memory), fiber optic devices, and portable compact disk read only memory (CDROM). In addition, the computer-readable medium may even be paper or other suitable media on which the program can be printed, because it can be used, for example, by optically scanning the paper or other media, and then editing, interpreting, or other suitable media if necessary. The program is processed in a manner to obtain the program electronically and then stored in the computer memory.
[0078] It should be understood that each part of this application can be implemented by hardware, software, firmware or a combination thereof. In the foregoing embodiments, multiple steps or methods can be implemented by software or firmware stored in a memory and executed by a suitable instruction execution system. For example, if it is implemented by hardware, as in another embodiment, it can be implemented by any one or a combination of the following technologies known in the art: a logic gate circuit for implementing logic functions on data signals Discrete logic circuits, application-specific integrated circuits with suitable combinational logic gates, programmable gate array (PGA), field programmable gate array (FPGA), etc.
[0079] Those of ordinary skill in the art can understand that all or part of the steps carried in the method of the foregoing embodiments can be implemented by a program instructing relevant hardware to complete, and the program can be stored in a computer-readable storage medium. When executed, it includes one of the steps of the method embodiment or a combination thereof.
[0080] In addition, the functional units in the various embodiments of the present application may be integrated into one processing module, or each unit may exist alone physically, or two or more units may be integrated into one module. The above-mentioned integrated modules can be implemented in the form of hardware or software functional modules. If the integrated module is implemented in the form of a software function module and sold or used as an independent product, it may also be stored in a computer readable storage medium.
[0081] The storage medium mentioned above may be a read-only memory, a magnetic disk or an optical disk, etc. Although the embodiments of the present application have been shown and described above, it can be understood that the above-mentioned embodiments are exemplary and should not be construed as limiting the present application. A person of ordinary skill in the art can comment on the foregoing within the scope of the present application. The embodiment undergoes changes, modifications, replacements and modifications.