A training data evaluation method and related devices

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
By combining two sets of gating parameters to evaluate the importance of training data, the accuracy problem caused by the single evaluation method in the existing technology is solved, and training data selection with higher accuracy is achieved, thereby improving the model training effect.

CN116882472BActive Publication Date: 2026-06-19HUAWEI TECH CO LTD

View PDF 2 Cites 0 Cited by

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Patents(China)
Current Assignee / Owner: HUAWEI TECH CO LTD
Filing Date: 2023-05-30
Publication Date: 2026-06-19

Application Information

Patent Timeline

30 May 2023

Application

19 Jun 2026

Publication

CN116882472B

IPC: G06N3/08; G06F18/214

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

Model training method and related equipment thereof
CN118262380A
Pre-training data processing method and device
CN120011767A
Multimedia content evaluation method and device and training method thereof
CN112183946A
Model evaluation method and device, electronic equipment and storage medium
CN113704082A
Artificial intelligence-based sample evaluation method, apparatus, device, and storage medium
WO2021121128A1

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

AI Technical Summary

⚠Technical Problem

In existing technologies, the evaluation methods for training data consider only one factor, resulting in inaccurate evaluation values and an inability to accurately select important training data.

⚗Method used

By combining training data with two sets of gating parameters, the role of training data in model training is strengthened and weakened respectively. The importance of training data is evaluated using two models, and the different effects of training data on model training are comprehensively considered.

🎯Benefits of technology

It improves the accuracy of training data evaluation, enabling more accurate selection of important training data and enhancing model training performance.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure CN116882472B_ABST

Patent Text Reader

Abstract

This application discloses a training data evaluation method and related equipment. The method considers a comprehensive range of factors in evaluating the importance of training data, resulting in highly accurate evaluation values and enabling precise selection of training data. The method includes: first, acquiring N training data points, N sets of first gating parameters, and N sets of second gating parameters. Next, training a model to be trained based on the N training data points and the i-th set of first gating parameters to obtain the i-th first model, and then training the model to be trained based on the N training data points and the i-th set of second gating parameters to obtain the i-th second model. Then, a series of processing steps are performed on the i-th first model and the i-th second model to obtain the evaluation value of the i-th training data point. In this way, the final evaluation values of the N training data points are obtained.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the field of artificial intelligence (AI) technology, and in particular to a training data evaluation method and related equipment. Background Technology

[0002] In the training process of neural network models, a large amount of training data is typically used. To improve the performance of the trained neural network model, it is often necessary to select the more important training data and discard the less important training data, thereby training a better model. Before selecting the training data, it is usually necessary to evaluate the training data to determine its importance.

[0003] In related technologies, when evaluating multiple training data sets, multiple gating parameters corresponding one-to-one with each training data set can be obtained first. Next, these gating parameters can be applied to the training data sets to obtain processed training data. Then, the processed training data can be input into the model to be trained to obtain the processing results. Finally, the gating parameters can be updated based on the processing results; these updated gating parameters are the evaluation values for the training data, indicating their importance. This completes the evaluation of the multiple training data sets.

[0004] In the above process, the importance of multiple training data in the model training process is directly evaluated by a one-sided means (updated multiple gating parameters). This approach considers only a few factors, resulting in inaccurate evaluation values of the training data, and thus failing to accurately select the training data. Summary of the Invention

[0005] This application provides a training data evaluation method and related equipment. In the process of evaluating the importance of training data, the factors considered are more comprehensive, so the evaluation value of the obtained training data has high accuracy, and thus the training data can be accurately screened.

[0006] A first aspect of this application provides a training data evaluation method, characterized in that the method includes:

[0007] When evaluating N training data points, we can first obtain N training data points (e.g., user feature data or project feature data, etc.), N sets of first gating parameters, and N sets of second gating parameters, where N ≥ 2. The i-th training data point corresponds to the i-th set of first gating parameters in the N sets of first gating parameters, and also corresponds to the i-th set of second gating parameters in the N sets of second gating parameters. The i-th set of first gating parameters can be used to strengthen the role of the i-th training data point in the model training process, while the i-th set of second gating parameters can be used to weaken the role of the i-th training data point in the model training process.

[0008] After obtaining N training data points, the i-th set of first gating parameters, and the i-th set of second gating parameters, the model to be trained can be trained using the N training data points and the i-th set of first gating parameters to obtain the i-th first model. Similarly, the model to be trained using the N training data points and the i-th set of second gating parameters can be trained to obtain the i-th second model. Since the i-th set of first gating parameters can be used to strengthen the role of the i-th training data point in the model training process, the first model trained based on the i-th set of first gating parameters will depend on the i-th training data point. Conversely, since the i-th set of second gating parameters can be used to weaken the role of the i-th training data point in the model training process, the second model trained based on the i-th set of second gating parameters will not depend on the i-th training data point.

[0009] After obtaining the i-th first model and the i-th second model, a series of processes can be performed on the i-th first model and the i-th second model to obtain the evaluation value of the i-th training data. Since i = 1, ..., N, the evaluation values of N training data can be obtained in the end, which are used to indicate the importance of the N training data. At this point, the evaluation of the N training data is completed.

[0010] As can be seen from the above method, since the first gating parameter of the i-th group can be used to make the i-th first model dependent on the i-th training data, and the second gating parameter of the i-th group can be used to make the i-th second model independent of the i-th training data, the model training process considers not only the case where the i-th training data plays a significant role, but also the case where the i-th training data plays a minor role. The importance of the i-th training data is comprehensively evaluated based on the models (the i-th first model and the i-th second model) obtained from these two cases. This method considers a relatively comprehensive range of factors, so the final evaluation values of the N training data have high accuracy, thus enabling accurate selection of training data.

[0011] In one possible implementation, training the model to be trained based on N training data and the i-th set of first gating parameters to obtain the i-th first model includes: performing an i-th first processing on the N training data based on the i-th set of first gating parameters to obtain N training data after the i-th first processing, wherein the i-th set of first gating parameters includes N first gating parameters that correspond one-to-one with the N training data, the i-th first gating parameter is within a first value range, and the other first gating parameters are within a second value range; training the model to be trained based on the N training data after the i-th first processing to obtain the i-th first model. In the aforementioned implementation, in the i-th group of first gating parameters, since the i-th first gating parameter is within the first value range, its value is used to strengthen the role of the i-th training data, and the remaining first gating parameters are within the second value range, their values are used to weaken the role of the remaining training data. Therefore, with the cooperation of the i-th group of first gating parameters and N training data, the i-th first model obtained by training can depend on the i-th group of first gating parameters.

[0012] In one possible implementation, training the model to be trained based on N training data and the i-th set of second gating parameters to obtain the i-th second model includes: performing an i-th second processing on the N training data based on the i-th set of second gating parameters to obtain N training data after the i-th second processing, wherein the i-th set of second gating parameters contains N second gating parameters that correspond one-to-one with the N training data, the i-th second gating parameter is within a second value range, and the other second gating parameters are within a first value range; training the model to be trained based on the N training data after the i-th second processing to obtain the i-th second model. In the aforementioned implementation, in the i-th group of second gating parameters, since the i-th second gating parameter is within the second value range, its value is used to weaken the role of the i-th training data, and the remaining second gating parameters are within the first value range, their values are used to strengthen the role of the remaining training data. Therefore, with the cooperation of the i-th group of second gating parameters and N training data, the i-th second model obtained by training will not depend on the i-th group of second gating parameters.

[0013] In one possible implementation, training the model to be trained based on the N training data after the i-th first processing to obtain the i-th first model includes: inputting the N training data after the i-th first processing into the model to be trained to obtain the i-th first processing result; updating the parameters of the model to be trained based on the i-th first processing result and the i-th first learning rate to obtain the i-th first model, where the i-th first learning rate is used to ensure that the difference between the performance of the i-th first model and the performance of the model to be trained is within a third value range. In the aforementioned implementation, the value of the i-th first learning rate can be dynamically adjusted. Initially, the i-th first learning rate can be a preset value. After training the model to be trained using the i-th first gradient and the i-th first learning rate to obtain the i-th first model, if the difference between the performance of the i-th first model and the performance of the model to be trained is outside the third value range, the value of the i-th first learning rate can be readjusted, and the model to be trained can be retrained using the i-th first gradient and the i-th first learning rate until the difference between the performance of the i-th first model and the performance of the model to be trained is within the third value range. In this way, the lower limit of the performance of the i-th first model can be effectively guaranteed. Therefore, obtaining the evaluation value of the i-th training data based on the i-th first model can further improve the accuracy of the evaluation value of the i-th training data.

[0014] In one possible implementation, training the model to be trained based on the N training data after the i-th second processing to obtain the i-th second model includes: inputting the N training data after the i-th second processing into the model to be trained to obtain the i-th second processing result; updating the parameters of the model to be trained based on the i-th second processing result and the i-th second learning rate to obtain the i-th second model, where the i-th second learning rate is used to ensure that the difference between the performance of the i-th second model and the performance of the model to be trained is within a third value range. In the aforementioned implementation, the value of the i-th second learning rate can be dynamically adjusted. Initially, the i-th second learning rate can be a preset value. After training the model to be trained using the i-th second gradient and the i-th second learning rate to obtain the i-th second model, if the difference between the performance of the i-th second model and the performance of the model to be trained is outside the third value range, the value of the i-th second learning rate can be readjusted, and the model to be trained can be retrained using the i-th second gradient and the i-th second learning rate until the difference between the performance of the i-th second model and the performance of the model to be trained is within the third value range. In this way, the lower limit of the performance of the i-th second model can be effectively guaranteed. Therefore, obtaining the evaluation value of the i-th training data based on the i-th second model can further improve the accuracy of the evaluation value of the i-th training data.

[0015] In one possible implementation, obtaining the evaluation value of the i-th training data based on the i-th first model and the i-th second model includes: acquiring N new training data points, each corresponding one-to-one with the previous N training data points, the previous N training data points originating from the first dataset and the previous N new training data points originating from the second dataset; processing the N new training data points using the i-th first model to obtain the upper limit of the evaluation value of the i-th training data point; processing the N new training data points using the i-th second model to obtain the lower limit of the evaluation value of the i-th training data point; and obtaining the evaluation value of the i-th training data point based on the upper limit and lower limit of the evaluation value of the i-th training data point. In this approach, the performance of the i-th first model and the i-th second model can be tested using the N new training data points that correspond one-to-one with the previous N training data points, thereby accurately analyzing the upper limit and lower limit of the evaluation value of the i-th training data point, and thus obtaining the evaluation value range (importance range) of the i-th training data point. Based on this range, the evaluation value of the i-th training data point can be given more accurately.

[0016] In one possible implementation, processing N new training data using the i-th first model to obtain the upper limit of the evaluation value of the i-th training data includes: inputting N new training data into the i-th first model to obtain the i-th third processing result; inputting the remaining new training data (excluding the i-th new training data) and the i-th new training data after perturbation into the i-th first model to obtain the i-th fourth processing result; and obtaining the upper limit of the evaluation value of the i-th training data based on the i-th third processing result and the i-th fourth processing result. In the aforementioned implementation, since the i-th new training data corresponds to the i-th training data (the type is the same), the difference between the performance of the i-th first model on the i-th new training data (which can be obtained through the i-th third processing result) and the performance of the i-th first model on the i-th new training data after perturbation (which can be obtained through the i-th fourth processing result) can be tested, and this difference can be used as the upper limit of the evaluation value of the i-th training data.

[0017] In one possible implementation, processing N new training data using the i-th second model to obtain the lower bound of the evaluation value of the i-th training data includes: inputting N new training data into the i-th second model to obtain the i-th fifth processing result; inputting the remaining new training data (excluding the i-th new training data) and the i-th new training data after perturbation into the i-th second model to obtain the i-th sixth processing result; and obtaining the lower bound of the evaluation value of the i-th training data based on the i-th fifth processing result and the i-th sixth processing result. In the aforementioned implementation, since the i-th new training data corresponds to the i-th training data (the type is the same), the difference between the performance of the i-th second model on the i-th new training data (which can be obtained through the i-th fifth processing result) and the performance of the i-th second model on the i-th new training data after perturbation (which can be obtained through the i-th sixth processing result) can be tested, and this difference can be used as the lower bound of the evaluation value of the i-th training data.

[0018] In one possible implementation, the method further includes: selecting M training data points (N≥M≥1) from the N training data points based on their evaluation values; updating the parameters of the model to be trained based on the M training data points until the model training conditions are met, thus obtaining a third model. In the aforementioned implementation, after obtaining the evaluation values of the N training data points, the top M training data points with the highest evaluation values can be selected. These M training data points can then be used as the current batch of training data and input into the model to be trained to obtain the corresponding processing result. The parameters of the model to be trained are then updated based on this processing result, resulting in a model with updated parameters. The updated model is then trained using the next batch of training data (the next batch of training data contains M training data points that correspond one-to-one with the current batch of training data points) until the model training conditions are met, thus obtaining the third model.

[0019] A second aspect of this application provides a training data evaluation apparatus, comprising: an acquisition module for acquiring N training data, N sets of first gating parameters, and N sets of second gating parameters, where N ≥ 2; a first training module for training a model to be trained based on the N training data and the i-th set of first gating parameters to obtain an i-th first model, wherein the i-th set of first gating parameters is used to make the i-th first model dependent on the i-th training data, i = 1, ..., N; a second training module for training the model to be trained based on the N training data and the i-th set of second gating parameters to obtain an i-th second model, wherein the i-th set of second gating parameters is used to make the i-th second model independent of the i-th training data; and an evaluation module for acquiring an evaluation value of the i-th training data based on the i-th first model and the i-th second model, wherein the evaluation value of the i-th training data is used to indicate the importance of the i-th training data.

[0020] As can be seen from the above apparatus, when evaluating N training data points, N training data points, N sets of first gating parameters, and N sets of second gating parameters can be obtained firstly. Then, the training model can be trained based on the N training data points and the i-th set of first gating parameters to obtain the i-th first model, and then trained again based on the N training data points and the i-th set of second gating parameters to obtain the i-th second model. Finally, a series of processing steps can be applied to the i-th first model and the i-th second model to obtain the evaluation value of the i-th training data point. In this way, the final evaluation values for the N training data points can be obtained. In the aforementioned process, since the first gating parameters of the i-th group can be used to make the i-th first model dependent on the i-th training data, and the second gating parameters of the i-th group can be used to make the i-th second model independent of the i-th training data, the model training process considers not only the case where the i-th training data plays a significant role, but also the case where the i-th training data plays a minor role. The importance of the i-th training data is comprehensively evaluated based on the models (the i-th first model and the i-th second model) obtained from these two cases. This method considers a relatively comprehensive range of factors, so the final evaluation values of the N training data have high accuracy, thus enabling accurate selection of training data.

[0021] In one possible implementation, the first training module is configured to: perform a first processing on N training data based on the i-th set of first gating parameters to obtain N training data after the i-th first processing, wherein the i-th set of first gating parameters includes N first gating parameters that correspond one-to-one with the N training data, the i-th first gating parameter is within a first value range, and the remaining first gating parameters are within a second value range; and train the model to be trained based on the N training data after the i-th first processing to obtain the i-th first model.

[0022] In one possible implementation, the second training module is used to: perform a second processing on N training data based on the i-th set of second gating parameters to obtain N training data after the i-th second processing, wherein the i-th set of second gating parameters includes N second gating parameters that correspond one-to-one with the N training data, the i-th second gating parameter is within a second value range, and the remaining second gating parameters are within a first value range; and train the model to be trained based on the N training data after the i-th second processing to obtain the i-th second model.

[0023] In one possible implementation, the first training module is used to: input N training data after the i-th first processing into the model to be trained to obtain the i-th first processing result; update the parameters of the model to be trained based on the i-th first processing result and the i-th first learning rate to obtain the i-th first model, wherein the i-th first learning rate is used to ensure that the difference between the performance of the i-th first model and the performance of the model to be trained is within a third value range.

[0024] In one possible implementation, the second training module is used to: input N training data after the i-th second processing into the model to be trained to obtain the i-th second processing result; update the parameters of the model to be trained based on the i-th second processing result and the i-th second learning rate to obtain the i-th second model, wherein the i-th second learning rate is used to ensure that the difference between the performance of the i-th second model and the performance of the model to be trained is within a third value range.

[0025] In one possible implementation, the evaluation module is used to: acquire N new training data points, each corresponding one-to-one with one of the existing training data points, where the existing training data points originate from a first dataset and the new training data points originate from a second dataset; process the N new training data points using the i-th first model to obtain the upper bound of the evaluation value of the i-th training data point; process the N new training data points using the i-th second model to obtain the lower bound of the evaluation value of the i-th training data point; and obtain the evaluation value of the i-th training data point based on the upper bound and the lower bound of the evaluation value of the i-th training data point.

[0026] In one possible implementation, the evaluation module is used to: input N new training data into the i-th first model to obtain the i-th third processing result; input the remaining new training data other than the i-th new training data and the i-th new training data after perturbation into the i-th first model to obtain the i-th fourth processing result; and obtain the upper limit of the evaluation value of the i-th training data based on the i-th third processing result and the i-th fourth processing result.

[0027] In one possible implementation, the evaluation module is used to: input N new training data into the i-th second model to obtain the i-th fifth processing result; input the remaining new training data other than the i-th new training data and the i-th new training data after perturbation into the i-th second model to obtain the i-th sixth processing result; and obtain the lower limit of the evaluation value of the i-th training data based on the i-th fifth processing result and the i-th sixth processing result.

[0028] In one possible implementation, the device further includes: a selection module for selecting M training data from the N training data based on the evaluation values of the N training data, where N≥M≥1; and a third training module for updating the parameters of the model to be trained based on the M training data until the model training conditions are met, thereby obtaining a third model.

[0029] A third aspect of this application provides a training data evaluation apparatus, which includes a memory and a processor; the memory stores code, and the processor is configured to execute the code. When the code is executed, the training data evaluation apparatus performs the method described in the first aspect or any possible implementation thereof.

[0030] A fourth aspect of this application provides a circuit system including a processing circuit configured to perform the method as described in the first aspect or any possible implementation thereof.

[0031] A fifth aspect of this application provides a chip system including a processor for calling a computer program or computer instructions stored in a memory to cause the processor to perform the method as described in the first aspect or any possible implementation thereof.

[0032] In one possible implementation, the processor is coupled to the memory via an interface.

[0033] In one possible implementation, the chip system also includes a memory that stores computer programs or computer instructions.

[0034] A sixth aspect of this application provides a computer storage medium storing a computer program that, when executed by a computer, causes the computer to perform the method as described in the first aspect or any possible implementation thereof.

[0035] A seventh aspect of this application provides a computer program product storing instructions that, when executed by a computer, cause the computer to perform the method as described in the first aspect or any possible implementation thereof.

[0036] In this embodiment, when evaluation is required for N training data points, N training data points, N sets of first gating parameters, and N sets of second gating parameters can be obtained firstly. Then, the training model can be trained based on the N training data points and the i-th set of first gating parameters to obtain the i-th first model, and the training model can be trained based on the N training data points and the i-th set of second gating parameters to obtain the i-th second model. Then, a series of processing steps can be performed on the i-th first model and the i-th second model to obtain the evaluation value of the i-th training data point. In this way, the final evaluation values for the N training data points can be obtained. In the aforementioned process, since the first gating parameters of the i-th group can be used to make the i-th first model dependent on the i-th training data, and the second gating parameters of the i-th group can be used to make the i-th second model independent of the i-th training data, the model training process considers not only the case where the i-th training data plays a significant role, but also the case where the i-th training data plays a minor role. The importance of the i-th training data is comprehensively evaluated based on the models (the i-th first model and the i-th second model) obtained from these two cases. This method considers a relatively comprehensive range of factors, so the final evaluation values of the N training data have high accuracy, thus enabling accurate selection of training data. Attached Figure Description

[0037] Figure 1 A structural diagram illustrating the main framework of artificial intelligence;

[0038] Figure 2a A schematic diagram of the structure of the data processing system provided in the embodiments of this application;

[0039] Figure 2b This is another schematic diagram of the data processing system provided in the embodiments of this application;

[0040] Figure 2c A schematic diagram of the data processing related equipment provided in the embodiments of this application;

[0041] Figure 3 A schematic diagram of the system 100 architecture provided in the embodiments of this application;

[0042] Figure 4 A schematic diagram of the structure of the project recommendation system provided in the embodiments of this application;

[0043] Figure 5 A flowchart illustrating the training data evaluation method provided in this application embodiment;

[0044] Figure 6 A schematic diagram illustrating an application example of the training data evaluation method provided in the embodiments of this application;

[0045] Figure 7 Another schematic diagram illustrating an application example of the training data evaluation method provided in the embodiments of this application;

[0046] Figure 8 A flowchart illustrating the data processing method provided in this application embodiment;

[0047] Figure 9 A schematic diagram of the training data evaluation device provided in the embodiments of this application;

[0048] Figure 10 A schematic diagram of the structure of the data processing apparatus provided in the embodiments of this application;

[0049] Figure 11 A schematic diagram of the structure of the execution device provided in the embodiments of this application;

[0050] Figure 12 A schematic diagram of the structure of the training device provided in the embodiments of this application;

[0051] Figure 13 This is a schematic diagram of the structure of a chip provided in an embodiment of this application. Detailed Implementation

[0052] This application provides a training data evaluation method and related equipment. In the process of evaluating the importance of training data, the factors considered are more comprehensive, so the evaluation value of the obtained training data has high accuracy, and thus the training data can be accurately screened.

[0053] The terms "first," "second," etc., used in the specification, claims, and accompanying drawings of this application are used to distinguish similar objects and are not necessarily used to describe a specific order or sequence. It should be understood that such terms are interchangeable where appropriate; this is merely a way of distinguishing objects with the same attributes in the embodiments of this application. Furthermore, the terms "comprising" and "having," and any variations thereof, are intended to cover non-exclusive inclusion, so that a process, method, system, product, or apparatus that comprises a series of elements is not necessarily limited to those elements, but may include other elements not explicitly listed or inherent to those processes, methods, products, or apparatuses.

[0054] In the training process of neural network models, a large amount of training data is typically used. To improve the performance of the trained neural network model, it is often necessary to select the more important training data and discard the less important training data, thereby training a better model. Before selecting the training data, it is usually necessary to evaluate the training data to determine its importance.

[0055] In related technologies, when multiple training data need to be evaluated, multiple gating parameters corresponding one-to-one with each training data point can be obtained first. Next, these gating parameters can be applied to the multiple training data points to obtain processed training data. Then, the processed training data can be input into the model to be trained to obtain the processing results. Finally, the gating parameters can be updated based on the processing results; the updated gating parameters are the evaluation values of the multiple training data points, indicating their importance. Subsequently, based on the evaluation values of the multiple training data points, several more important training data points can be selected and used to train the model to obtain the target model.

[0056] In the above process, the importance of multiple training data in the model training process is directly evaluated by a one-sided means (updated multiple gating parameters). This approach considers only a few factors, resulting in inaccurate evaluation values of the training data, and thus failing to accurately select the training data.

[0057] To address the aforementioned problems, this application provides a training data evaluation method that can be implemented in conjunction with artificial intelligence (AI) technology. AI technology is a discipline that uses digital computers or machines controlled by digital computers to simulate, extend, and expand human intelligence. AI technology achieves optimal results by perceiving the environment, acquiring knowledge, and using that knowledge. In other words, artificial intelligence is a branch of computer science that attempts to understand the essence of intelligence and produce a new type of intelligent machine that can react in a way similar to human intelligence. Using artificial intelligence for data processing is a common application of AI.

[0058] First, the overall workflow of the artificial intelligence system is described; please refer to [link / reference]. Figure 1 , Figure 1 This is a structural diagram illustrating the main framework of artificial intelligence. The following explanation of the AI framework is based on two dimensions: the "Intelligent Information Chain" (horizontal axis) and the "IT Value Chain" (vertical axis). The "Intelligent Information Chain" reflects a series of processes from data acquisition to processing. For example, it could be the general process of intelligent information perception, intelligent information representation and formation, intelligent reasoning, intelligent decision-making, and intelligent execution and output. In this process, data undergoes a condensation process of "data—information—knowledge—wisdom." The "IT Value Chain" reflects the value that artificial intelligence brings to the information technology industry, from the underlying infrastructure of human intelligence and information (provided and processed by technology) to the industrial ecosystem of the system.

[0059] (1) Infrastructure

[0060] Infrastructure provides computing power to support artificial intelligence systems, enabling communication with the external world and providing support through a basic platform. This communication occurs through sensors; computing power is provided by intelligent chips (hardware acceleration chips such as CPUs, NPUs, GPUs, ASICs, and FPGAs); and the basic platform includes distributed computing frameworks and related platform guarantees and support, which may include cloud storage and computing, interconnected networks, etc. For example, sensors communicate with the outside world to acquire data, and this data is provided to intelligent chips in the distributed computing system provided by the basic platform for computation.

[0061] (2) Data

[0062] The data at the next layer of infrastructure is used to represent the data sources in the field of artificial intelligence. The data involves graphics, images, voice, text, and IoT data from traditional devices, including business data from existing systems and sensor data such as force, displacement, liquid level, temperature, and humidity.

[0063] (3) Data processing

[0064] Data processing typically includes methods such as data training, machine learning, deep learning, search, reasoning, and decision-making.

[0065] Among them, machine learning and deep learning can perform intelligent information modeling, extraction, preprocessing, and training on data, including symbolization and formalization.

[0066] Reasoning refers to the process in which, in a computer or intelligent system, the machine thinks and solves problems by simulating human intelligent reasoning, based on reasoning control strategies and using formalized information. Typical functions include search and matching.

[0067] Decision-making refers to the process of making decisions based on intelligent information after reasoning, and it typically provides functions such as classification, sorting, and prediction.

[0068] (4) General ability

[0069] After the data processing mentioned above, the results of the data processing can be used to form some general capabilities, such as algorithms or a general system, for example, translation, text analysis, computer vision processing, speech recognition, image recognition, etc.

[0070] (5) Smart Products and Industry Applications

[0071] Intelligent products and industry applications refer to products and applications of artificial intelligence systems in various fields. They are the encapsulation of overall artificial intelligence solutions, productizing intelligent information decision-making and realizing practical applications. Their application areas mainly include: intelligent terminals, intelligent transportation, intelligent healthcare, autonomous driving, smart cities, etc.

[0072] The following section introduces several application scenarios for this application.

[0073] Figure 2a This is a schematic diagram of a data processing system provided in an embodiment of this application. The data processing system includes user equipment and data processing equipment. The user equipment includes smart terminals such as mobile phones, personal computers, or information processing centers. The user equipment is the initiator of data processing; as the initiator of data processing requests, requests are typically initiated by the user through the user equipment.

[0074] The aforementioned data processing equipment can be cloud servers, network servers, application servers, management servers, or other devices or servers with data processing capabilities. The data processing equipment receives data processing requests from smart terminals through an interactive interface, and then performs data processing methods such as machine learning, deep learning, search, reasoning, and decision-making through a storage device and a data processing processor. The storage device in the data processing equipment can be a general term, including local storage and a database storing historical data. The database can be located on the data processing equipment or on other network servers.

[0075] exist Figure 2a In the data processing system shown, the user device can receive user instructions. For example, the user device can acquire target data input / selected by the user and then send a request to the data processing device. This causes the data processing device to perform a series of processes on the target data from the user device, thereby obtaining the processing result of the target data. For instance, the user device can acquire user feature data and feature data of multiple items input by the user. Then, the user device can send a data processing request to the data processing device, causing the data processing device to perform a series of processes on the user feature data and the feature data of multiple items based on the data processing request, thereby obtaining the processing result of these data, i.e., the probability that multiple items can be recommended to the user.

[0076] exist Figure 2a In this context, the data processing device can execute the data processing method of the embodiments of this application.

[0077] Figure 2b This is another schematic diagram of the data processing system provided in the embodiments of this application. Figure 2b In this context, the user equipment (UE) directly functions as a data processing device. This UE can directly acquire input from the user and process it directly through its own hardware. The specific process is similar to... Figure 2a Similar to the description above, it will not be repeated here.

[0078] exist Figure 2bIn the data processing system shown, the user equipment can receive user instructions. For example, the user equipment can acquire target data input by the user and then perform a series of processes on the target data to obtain the processing result of the target data. For instance, the user equipment can acquire user feature data and feature data of multiple items input by the user. Then, the user equipment can perform a series of processes on the user feature data and the feature data of multiple items to obtain the processing result of these data, i.e., the probability that multiple items can be recommended to the user.

[0079] exist Figure 2b In this context, the user equipment itself can execute the data processing method of the embodiments of this application.

[0080] Figure 2c This is a schematic diagram of a data processing device provided in an embodiment of this application.

[0081] The above Figure 2a and Figure 2b The user equipment in the context can specifically be Figure 2c Local device 301 or local device 302 in the system. Figure 2a The data processing equipment in the middle can specifically be Figure 2c The execution device 210 in the process includes a data storage system 250 that can store the data to be processed by the execution device 210. The data storage system 250 can be integrated into the execution device 210 or set up in the cloud or on other network servers.

[0082] Figure 2a and Figure 2b The processor in the system can perform data training / machine learning / deep learning using neural network models or other models (e.g., support vector machine-based models), and then use the trained or learned models to perform data processing applications on images to obtain the corresponding processing results.

[0083] Figure 3 A schematic diagram of the system 100 architecture provided in this application embodiment, in Figure 3 In the process, the execution device 110 is configured with an input / output (I / O) interface 112 for data interaction with external devices. Users can input data to the I / O interface 112 through the client device 140. The input data in this embodiment may include various scheduled tasks, callable resources, and other parameters.

[0084] During the preprocessing of input data by the execution device 110, or during the calculation module 111 of the execution device 110 performing calculations and other related processing (such as implementing the function of the neural network model in this application), the execution device 110 may call data, code, etc. in the data storage system 150 for corresponding processing, or store the data, instructions, etc. obtained from the corresponding processing into the data storage system 150.

[0085] Finally, I / O interface 112 returns the processing result to client device 140, thereby providing it to the user.

[0086] It is worth noting that the training device 120 can generate corresponding target models / rules based on different training data for different objectives or tasks. These target models / rules can then be used to achieve the aforementioned objectives or complete the aforementioned tasks, thereby providing the user with the required results. The training data can be stored in the database 130 and originates from training samples collected by the data acquisition device 160.

[0087] exist Figure 3 In the scenario shown, the user can manually provide input data, which can be done through the interface provided by I / O interface 112. Alternatively, the client device 140 can automatically send input data to I / O interface 112. If user authorization is required for the client device 140 to automatically send input data, the user can set the corresponding permissions in the client device 140. The user can view the output results of the execution device 110 on the client device 140, which can be presented in various forms such as display, sound, or animation. The client device 140 can also act as a data acquisition terminal, collecting the input data and output results of the input I / O interface 112 as new sample data and storing them in the database 130. Alternatively, data can be collected directly from the I / O interface 112 without going through the client device 140, using the input data and output results of the input I / O interface 112 as new sample data and storing them in the database 130.

[0088] It is worth noting that, Figure 3 This is merely a schematic diagram of a system architecture provided in an embodiment of this application. The positional relationships between the devices, components, modules, etc., shown in the diagram do not constitute any limitation. For example, in Figure 3 In this context, the data storage system 150 is an external memory relative to the execution device 110. However, in other cases, the data storage system 150 can also be placed within the execution device 110. For example... Figure 3 As shown, a neural network can be trained using training device 120.

[0089] This application also provides a chip including a neural network processor (NPU). This chip can be configured as follows: Figure 3 The execution device 110 shown is used to perform the calculations of the calculation module 111. This chip can also be located in, for example... Figure 3 The training device 120 shown is used to complete the training work of the training device 120 and output the target model / rules.

[0090] The Neural Processing Unit (NPU) is a coprocessor mounted on the main central processing unit (CPU) (host CPU), where tasks are assigned by the CPU. The core of the NPU is the computation circuitry, which is controlled by a controller to retrieve data from memory (weight memory or input memory) and perform calculations.

[0091] In some implementations, the arithmetic circuitry includes multiple process engines (PEs). In some implementations, the arithmetic circuitry is a two-dimensional pulsating array. The arithmetic circuitry can also be a one-dimensional pulsating array or other electronic circuitry capable of performing mathematical operations such as multiplication and addition. In some implementations, the arithmetic circuitry is a general-purpose matrix processor.

[0092] For example, suppose we have an input matrix A, a weight matrix B, and an output matrix C. The arithmetic circuit retrieves the corresponding data of matrix B from the weight memory and caches it in each PE (Process Equipment) of the arithmetic circuit. The arithmetic circuit retrieves the data of matrix A from the input memory and performs matrix operations with matrix B. The partial or final result of the obtained matrix is stored in the accumulator.

[0093] Vector computation units can further process the output of computational circuits, such as vector multiplication, vector addition, exponentiation, logarithmic operations, size comparisons, etc. For example, vector computation units can be used for computation in non-convolutional / non-FC layers of neural networks, such as pooling, batch normalization, and local response normalization.

[0094] In some implementations, the vector computation unit can store the processed output vector into a unified buffer. For example, the vector computation unit can apply a nonlinear function to the output of the arithmetic circuit, such as a vector of accumulated values, to generate activation values. In some implementations, the vector computation unit generates normalized values, merged values, or both. In some implementations, the processed output vector can be used as activation input to the arithmetic circuit, for example, for use in subsequent layers of a neural network.

[0095] The unified memory is used to store input data and output data.

[0096] The weight data is directly transferred from the external memory to the input memory and / or unified memory, stored in the weight memory, and stored in the unified memory to the external memory through the direct memory access controller (DMAC).

[0097] The bus interface unit (BIU) is used to enable interaction between the main CPU, DMAC, and instruction fetch memory via a bus.

[0098] The instruction fetch buffer, connected to the controller, is used to store the instructions used by the controller.

[0099] The controller is used to invoke instructions cached in the memory to control the operation of the computing accelerator.

[0100] Generally, the unified memory, input memory, weight memory, and instruction fetch memory are all on-chip memories, while external memory is memory outside the NPU. This external memory can be double data rate synchronous dynamic random access memory (DDRSDRAM), high bandwidth memory (HBM), or other readable and writable memories.

[0101] Since the embodiments of this application involve a large number of neural network applications, for ease of understanding, the relevant terms and concepts such as neural networks involved in the embodiments of this application will be introduced below.

[0102] (1) Neural Network

[0103] A neural network can be composed of neural units, which can be operational units that take xs and an intercept of 1 as inputs, and whose output can be:

[0104]

[0105] Where s = 1, 2, ..., n, where n is a natural number greater than 1, Ws is the weight of xs, and b is the bias of the neural unit. f is the activation function of the neural unit, used to introduce nonlinear characteristics into the neural network to convert the input signal in the neural unit into the output signal. The output signal of this activation function can be used as the input of the next convolutional layer. The activation function can be the sigmoid function. A neural network is a network formed by connecting many of the above-mentioned individual neural units together, that is, the output of one neural unit can be the input of another neural unit. The input of each neural unit can be connected to the local receptive field of the previous layer to extract the features of the local receptive field, which can be a region composed of several neural units.

[0106] The work of each layer in a neural network can be described by the mathematical expression y = a(Wx + b). From a physical perspective, the work of each layer in a neural network can be understood as transforming the input space (the set of input vectors) to the output space (i.e., from the row space to the column space of a matrix) through five operations on the input space. These five operations include: 1. Dimensionality increase / decrease; 2. Magnification / scaling; 3. Rotation; 4. Translation; 5. "Bending". Operations 1, 2, and 3 are performed by Wx, operation 4 by +b, and operation 5 by a(). The term "space" is used here because the objects being classified are not individual things, but a class of things, and space refers to the set of all individuals of this class of things. Here, W is the weight vector, and each value in this vector represents the weight value of a neuron in that layer of the neural network. This vector W determines the spatial transformation from the input space to the output space mentioned above; that is, the weights W of each layer control how the space is transformed. The purpose of training a neural network is to ultimately obtain the weight matrix of all layers of the trained neural network (a weight matrix formed by the vectors W of many layers). Therefore, the training process of a neural network is essentially about learning how to control the transformation space, and more specifically, learning the weight matrix.

[0107] Because we want the output of the neural network to be as close as possible to the actual predicted value, we can compare the current network's prediction with the desired target value, and then update the weight vector of each layer of the neural network based on the difference between the two (of course, there is usually an initialization process before the first update, that is, pre-configuring the parameters of each layer in the neural network). For example, if the network's prediction is too high, the weight vector is adjusted to make it predict lower, and this adjustment is continued until the neural network can predict the actual target value. Therefore, it is necessary to predefine "how to compare the difference between the predicted value and the target value," which is the loss function or objective function. These are important equations used to measure the difference between the predicted value and the target value. Taking the loss function as an example, the higher the output value (loss) of the loss function, the greater the difference, so training the neural network becomes the process of minimizing this loss as much as possible.

[0108] (2) Backpropagation algorithm

[0109] Neural networks can employ backpropagation (BP) to correct the parameters of the initial neural network model during training, thereby reducing the reconstruction error loss. Specifically, forward propagation of the input signal to the output generates error loss; this error loss information is then propagated back to update the parameters of the initial neural network model, leading to convergence of the error loss. The backpropagation algorithm is an error-loss-driven backpropagation process aimed at obtaining the optimal parameters of the neural network model, such as the weight matrix.

[0110] The method provided in this application is described below from the perspectives of neural network training and neural network application.

[0111] The training data evaluation method provided in this application can evaluate and select the more important training data to complete model training. During model training, data sequence processing is involved, which can be applied to data training, machine learning, deep learning, and other methods. This involves symbolic and formal intelligent information modeling, extraction, preprocessing, and training of the training data (e.g., the M training data in the training data evaluation method provided in this application), ultimately resulting in a trained neural network (e.g., the third model in the model training method provided in this application). Furthermore, the data processing method provided in this application can utilize the trained neural network to input input data (e.g., the target data in the data processing method provided in this application) into the trained neural network to obtain output data (e.g., the processing result of the target data in the data processing method provided in this application). It should be noted that the training data evaluation method and data processing method provided in this application are inventions based on the same concept, and can also be understood as two parts of a system, or two stages of an overall process: such as the model training stage and the model application stage.

[0112] The training data evaluation method provided in this application can be applied to various scenarios, such as object classification and text extraction in image processing systems, object detection and segmentation in autonomous driving systems, and click-through rate prediction in item recommendation systems. The following illustrative example uses click-through rate prediction in an item recommendation system. Figure 4 As shown ( Figure 4 (A schematic diagram of the structure of a project recommendation system provided in an embodiment of this application) The project recommendation system includes: a user interface display list, logs, an offline training module, and an online prediction module.

[0113] The basic operation flow of the item recommendation system is as follows: The system provides a user interface with a display list containing multiple items that the user can interact with. Users perform a series of actions on these items, such as browsing, clicking, commenting, and downloading, generating user data which is stored in logs. The item recommendation system can then use this user data from the logs as training data for offline model training. After training convergence, a prediction model is generated. This model is deployed in the online prediction module and provides recommendation results based on user requests, item features, and contextual information. Users then provide feedback on these recommendations, generating further user data.

[0114] Throughout the recommendation system process, the user data recorded by the system includes feature data from thousands of users and items. Using all of this feature data as training data is impractical, as more feature data means more computing resources are needed, online latency increases accordingly, and costs rise. Furthermore, noisy and redundant feature data can degrade the performance of the trained prediction model. Therefore, feature data filtering is necessary, and how to accurately evaluate the feature data during this process is crucial for improving the accuracy of the subsequent prediction model.

[0115] Based on this, the training data evaluation method provided in the embodiments of this application can be used to more accurately evaluate the feature data (training data), thereby more accurately selecting the feature data and training a prediction model with better performance.

[0116] Figure 5 A flowchart illustrating the training data evaluation method provided in this application embodiment is shown below. Figure 5 As shown, the method includes:

[0117] 501. Obtain N training data, N sets of first gating parameters and N sets of second gating parameters, where N ≥ 2.

[0118] When it is necessary to evaluate N training data, you can first obtain N training data (for example, the training data can be user feature data such as user's name, user's gender, user's age, etc., or the training data can be project feature data such as project name, project price, project function, etc.), N sets of first gating parameters, and N sets of second gating parameters (N is a positive integer greater than or equal to 2).

[0119] Since the operations performed on each of the N training data are similar, the following description will use any one of the N training data as an example, that is, the i-th training data (i = 1, ..., N) in the N training data for illustration.

[0120] It is worth noting that the i-th training data corresponds to the i-th group of first gating parameters in the N groups of first gating parameters, and also corresponds to the i-th group of second gating parameters in the N groups of second gating parameters. The i-th group of first gating parameters can be used to strengthen the role of the i-th training data in the model training process, while weakening the role of the remaining training data. The i-th group of second gating parameters can be used to weaken the role of the i-th training data in the model training process, while strengthening the role of the remaining training data.

[0121] It should be noted that the i-th group of first gating parameters contains N first gating parameters, each corresponding one-to-one with one of the N training data points. The i-th gating parameter among these N first gating parameters lies within a first value range (the upper and lower limits of the first value range can be set according to actual needs, and are not limited here). The remaining first gating parameters, excluding the i-th parameter, lie within a second value range (the upper and lower limits of the second value range can be set according to actual needs, and are not limited here). Since there is no overlap between the first and second value ranges, the value of the i-th first gating parameter is different from the values of the other first gating parameters. The value of the i-th first gating parameter can be used to strengthen the effect of the i-th training data (e.g., the value of the i-th first gating parameter is 1), while the values of the other first gating parameters can be used to weaken the effect of the other training data (e.g., the values of the other first gating parameters are close to 0).

[0122] Similarly, the i-th group of second gating parameters contains N second gating parameters, each corresponding one-to-one with one of the N training data points. The i-th gating parameter among these N second gating parameters lies within the second value range, while the remaining second gating parameters (excluding the i-th parameter) lie within the first value range. Since there is no overlapping interval between the first and second value ranges, the value of the i-th second gating parameter differs from the values of the other second gating parameters. The value of the i-th second gating parameter can be used to reinforce the effect of the i-th training data (e.g., the value of the i-th second gating parameter is close to 0), while the values of the other second gating parameters can be used to reinforce the effect of the remaining training data (e.g., the value of the other second gating parameters is 1).

[0123] 502. Based on N training data and the i-th group of first gating parameters, train the model to be trained to obtain the i-th first model. The i-th group of first gating parameters is used to make the i-th first model dependent on the i-th training data, i = 1, ..., N.

[0124] 503. Based on N training data and the i-th set of second gating parameters, train the model to be trained to obtain the i-th second model. The i-th set of second gating parameters is used to make the i-th second model independent of the i-th training data.

[0125] After obtaining N training data, the i-th set of first gating parameters, and the i-th set of second gating parameters, the model to be trained (i.e., the neural network model to be trained) can be trained using the N training data and the i-th set of first gating parameters to obtain the i-th first model. Then, the model to be trained can be trained using the N training data and the i-th set of second gating parameters to obtain the i-th second model.

[0126] Since the first gating parameters of the i-th group can be used to strengthen the role of the i-th training data in the model training process and weaken the role of the remaining training data, the first model trained based on the first gating parameters of the i-th group will depend on the i-th training data, but not on the remaining training data. Similarly, since the second gating parameters of the i-th group can be used to weaken the role of the i-th training data in the model training process and strengthen the role of the remaining training data, the second model trained based on the second gating parameters of the i-th group will not depend on the i-th training data, but will depend on the remaining training data.

[0127] Specifically, the first model and the second model can be obtained in the following ways:

[0128] (1) After obtaining N training data and the i-th set of first gating parameters, the i-th set of first gating parameters can be used to perform the i-th first processing on the N training data respectively, so as to obtain N training data after the i-th first processing. It should be noted that the i-th set of first gating parameters contains N first gating parameters. Among these N first gating parameters, the i-th first processing can be performed on the 1st gating parameter and the 1st training data (for example, the i-th first processing is a multiplication process), so as to obtain the 1st training data after the i-th first processing, and the i-th first processing can be performed on the 2nd gating parameter and the 2nd training data, so as to obtain the 2nd training data after the i-th first processing, ..., and finally the i-th first processing can be performed on the N-th gating parameter and the N-th training data, so as to obtain the N-th training data after the i-th first processing.

[0129] For example, such as Figure 6 As shown ( Figure 6(This is a schematic diagram illustrating an application example of the training data evaluation method provided in this application embodiment.) Assume there are 10 feature data points for users and projects (i.e., the aforementioned training data), and the evaluation value of the first feature data has been obtained. Now, we are preparing to obtain the evaluation value of the second feature data. Then, we can obtain a second set of reinforcement gating parameters (i.e., the aforementioned second set of first gating parameters) and a second set of weakening gating parameters (i.e., the aforementioned second set of second gating parameters). The second set of reinforcement gating parameters contains 10 reinforcement gating parameters: the first reinforcement gating parameter is 0.05, the second reinforcement gating parameter is 1, ..., and the tenth reinforcement gating parameter is 0.08. It can be seen that the second set of reinforcement gating parameters is used to reinforce the role played by the second feature data. The second group of weakening gating parameters contains 10 weakening gating parameters. The first weakening gating parameter is 1, the second weakening gating parameter is 0.05, ..., the tenth weakening gating parameter is 1. The second group of weakening gating parameters is used to strengthen the role of the remaining feature data except for the second feature data.

[0130] Then, the first enhancement gating parameter and the first feature data can be enhanced to obtain the first feature data after the second enhancement processing. The second enhancement gating parameter and the second feature data can be enhanced to obtain the second feature data after the second enhancement processing. ... The tenth enhancement gating parameter and the tenth feature data can be enhanced to obtain the tenth feature data after the second enhancement processing.

[0131] (2) After obtaining the N training data after the i-th first processing, the model to be trained can be trained using the N training data after the i-th first processing, thereby obtaining the i-th first model. Then, the obtained i-th first model depends on the i-th training data and does not depend on the remaining training data.

[0132] More specifically, the first model can also be obtained in the following ways:

[0133] (2.1) After obtaining the N training data after the first processing of the i-th time, the N training data after the first processing of the i-th time can be input into the model to be trained, so that the N training data after the first processing of the i-th time can be processed by the model to be trained, thereby obtaining the i-th first processing result of the N training data.

[0134] (2.2) After obtaining the i-th first processing result, the i-th first processing result of the N training data and the true processing result of the N training data can be calculated to obtain the i-th first loss (used to indicate the difference between the i-th first processing result and the true processing result). Then, the i-th first gradient can be calculated based on the i-th first loss, and the i-th first gradient and the i-th first learning rate (used to indicate how to use the i-th first gradient to update the parameters of the model to be trained, that is, the magnitude of the parameter update) can be used to update the parameters of the model to be trained to obtain the i-th first model.

[0135] It should be noted that the value of the i-th first learning rate can be dynamically adjusted. Initially, the i-th first learning rate can be a preset value. After training the model to be trained using the i-th first gradient and the i-th first learning rate to obtain the i-th first model, if the difference between the performance of the i-th first model and the performance of the model to be trained is outside the third value range (the upper and lower bounds of the third value range can be set according to actual needs, and are not restricted here), the value of the i-th first learning rate can be readjusted (for example, to half of the preset value), and the model to be trained can be retrained using the i-th first gradient and the i-th first learning rate until the difference between the performance of the i-th first model and the performance of the model to be trained is within the third value range.

[0136] Continuing with the example above, after obtaining the 10 feature data points following the second reinforcement processing, these 10 feature data points can be input into model M0. Model M0 can process these 10 feature data points to obtain the second reinforcement processing result (i.e., the aforementioned second first processing result), and calculate the second reinforcement gradient (i.e., the aforementioned second first gradient) based on the second reinforcement processing result. Then, the second reinforcement learning rate (i.e., the aforementioned second first learning rate) can be obtained. At this time, the second reinforcement learning rate is the initial preset value γ. The parameters of M0 can be updated using the second reinforcement gradient and the second reinforcement learning rate to obtain model M1. If the difference between the performance of M1 and the performance of M0 exceeds the confidence level ε, the second reinforcement learning rate is readjusted to 0.5γ, and the parameters of M0 are updated again using the second reinforcement gradient and the second reinforcement learning rate until the difference between the performance of M1 and the performance of M0 does not exceed ε. In this way, the resulting M1 is strongly dependent on the second feature data, and weakly dependent on the remaining feature data.

[0137] (3) After obtaining N training data and the i-th set of second gating parameters, the i-th set of second gating parameters can be used to perform the i-th second processing on the N training data respectively, so as to obtain N training data after the i-th second processing. It should be noted that the i-th set of second gating parameters contains N second gating parameters. Among these N second gating parameters, the i-th second processing can be performed on the 1st gating parameter and the 1st training data (for example, the i-th second processing is a multiplication process), so as to obtain the 1st training data after the i-th second processing, and the i-th second processing can be performed on the 2nd gating parameter and the 2nd training data, so as to obtain the 2nd training data after the i-th second processing, ..., and finally the i-th second processing can be performed on the N-th gating parameter and the N-th training data, so as to obtain the N-th training data after the i-th second processing.

[0138] For example, such as Figure 7 As shown ( Figure 7 Another schematic diagram illustrating an application example of the training data evaluation method provided in the embodiments of this application. Figure 7 Is Figure 6 Based on the above, the first weakening gate parameter and the first feature data can be weakened to obtain the first feature data after the second weakening process. The second weakening gate parameter and the second feature data can be weakened to obtain the second feature data after the second weakening process. ... The tenth weakening gate parameter and the tenth feature data can be weakened to obtain the tenth feature data after the second weakening process.

[0139] (4) After obtaining the Nth training data of the i-th second processing, the N training data after the i-th second processing can be used to train the model to be trained, thereby obtaining the i-th second model. Then, the obtained i-th second model does not depend on the i-th training data, but depends on the remaining training data.

[0140] More specifically, the second model can also be obtained in the following ways:

[0141] (4.1) After obtaining the N training data after the i-th second processing, the N training data after the i-th second processing can be input into the model to be trained, so that the N training data after the i-th second processing can be processed by the model to be trained, thereby obtaining the i-th second processing result of the N training data.

[0142] (4.2) After obtaining the i-th second processing result, the i-th second processing result of the N training data and the true processing result of the N training data can be calculated to obtain the i-th second loss (used to indicate the difference between the i-th second processing result and the true processing result). Then, the i-th second gradient can be calculated based on the i-th second loss, and the i-th second gradient and the i-th second learning rate (used to indicate how to use the i-th second gradient to update the parameters of the model to be trained, that is, the magnitude of the parameter update) can be used to update the parameters of the model to be trained to obtain the i-th second model.

[0143] It should be noted that the value of the i-th second learning rate can be dynamically adjusted. Initially, the i-th second learning rate can be a preset value. After training the model to be trained using the i-th second gradient and the i-th second learning rate to obtain the i-th second model, if the difference between the performance of the i-th second model and the performance of the model to be trained is outside the third value range, the value of the i-th second learning rate can be readjusted (for example, to half of the preset value), and the model to be trained can be retrained using the i-th second gradient and the i-th second learning rate until the difference between the performance of the i-th second model and the performance of the model to be trained is within the third value range.

[0144] Continuing with the example above, after obtaining the 10 feature data after the second weakening process, these 10 feature data can be input into model M0. Model M0 can process these 10 feature data to obtain the second weakening result (i.e., the aforementioned second first processing result), and calculate the second weakening gradient (i.e., the aforementioned second first gradient) based on the second weakening result. Then, the second weakening learning rate (i.e., the aforementioned second first learning rate) can be obtained. At this time, the second weakening learning rate is the initial preset value γ. The parameters of M0 can be updated using the second weakening gradient and the second weakening learning rate to obtain model M2. If the difference between the performance of M2 and the performance of M0 exceeds the confidence level ε, the second weakening learning rate is readjusted to 0.5γ, and the parameters of M0 are updated again using the second weakening gradient and the second weakening learning rate until the difference between the performance of M2 and the performance of M0 does not exceed ε. In this way, the resulting M2 is weakly dependent on the second feature data, but strongly dependent on the remaining feature data.

[0145] 504. Based on the i-th first model and the i-th second model, obtain the evaluation value of the i-th training data. The evaluation value of the i-th training data is used to indicate the importance of the i-th training data.

[0146] After obtaining the i-th first model and the i-th second model, a series of processes can be performed on them to obtain the evaluation value of the i-th training data. The evaluation value of the i-th training data indicates its importance. Then, the same operations can be performed on the remaining training data (excluding the i-th training data), thus ultimately obtaining the evaluation values of N training data. This completes the evaluation of the N training data.

[0147] Specifically, the evaluation value of the i-th training data can be obtained in the following way:

[0148] (1) Obtain N new training data points that correspond one-to-one with the N training data points. The N training data points can be collected from the first dataset, and the N new training data points can be collected from the second dataset. The first dataset and the second dataset are different datasets. It should be noted that the one-to-one correspondence between the N training data points and the N new training data points mentioned here usually means that the type of the i-th training data point is the same as the type of the i-th training data point.

[0149] (2) After obtaining N new training data, the N new training data can be input into the i-th first model so that the i-th first model can process the N new training data to obtain the upper limit of the evaluation value of the i-th training data.

[0150] More specifically, the upper limit of the evaluation value of the i-th training data can be obtained in the following way:

[0151] (2.1) After obtaining N new training data, the N new training data can be input into the i-th first model so that the N new training data can be processed by the i-th first model to obtain the i-th third processing result of the N new training data.

[0152] (2.2) Among the N new training data, a perturbation can be added to the i-th new training data to change its content and properties, resulting in the perturbated i-th new training data (for example, if the i-th new training data represents the user's gender as male, after perturbation, the perturbated i-th new training data will represent the user's gender as female). Then, the remaining new training data (excluding the i-th new training data) and the perturbated i-th new training data can be input into the i-th first model. The i-th first model can then process the remaining new training data and the perturbated i-th new training data to obtain the i-th fourth processing result of the N new training data.

[0153] (2.3) After obtaining the i-th third processing result and the i-th fourth processing result, the i-th third processing result and the actual processing results of N new training data can be calculated to obtain the i-th third loss. Similarly, the i-th fourth processing result and the actual processing results of N new training data can be calculated to obtain the i-th fourth loss. The change between the i-th third loss and the i-th fourth loss can then be used as the upper limit of the evaluation value of the i-th training data.

[0154] (3) After obtaining N new training data, the N new training data can be input into the i-th second model so that the N new training data can be processed by the i-th second model to obtain the lower limit of the evaluation value of the i-th training data.

[0155] More specifically, the lower bound of the evaluation value for the i-th training data can be obtained in the following way:

[0156] (3.1) After obtaining N new training data, the N new training data can be input into the i-th second model so that the N new training data can be processed by the i-th second model to obtain the i-th fifth processing result of the N new training data.

[0157] (3.2) Among the N new training data, a perturbation can be added to the i-th new training data to change its content and properties, resulting in the i-th new training data with the perturbation. Then, the remaining new training data (excluding the i-th new training data) and the i-th new training data with the perturbation can be input into the i-th second model. The i-th second model can then process the remaining new training data and the i-th new training data with the perturbation, thereby obtaining the i-th sixth processing result of the N new training data.

[0158] (3.3) After obtaining the i-th fifth processing result and the i-th sixth processing result, the i-th fifth processing result and the actual processing results of N new training data can be calculated to obtain the i-th fifth loss. Similarly, the i-th sixth processing result and the actual processing results of N new training data can be calculated to obtain the i-th sixth loss. The change between the i-th fifth loss and the i-th sixth loss can then be used as the lower bound of the evaluation value of the i-th training data.

[0159] (4) Obtaining the upper limit of the evaluation value of the i-th training data and the lower limit of the evaluation value of the i-th training data is equivalent to obtaining the evaluation value interval (also known as the importance interval) of the i-th training data. Then, this evaluation value interval can be calculated (e.g., by averaging) to obtain the evaluation value of the i-th training data.

[0160] Furthermore, after step 504, the following steps may also be performed:

[0161] 505. After obtaining the evaluation values of N training data, the top M training data with the highest evaluation values can be selected from the N training data (M is a positive integer less than or equal to N and greater than or equal to 1). These M training data are the more important M training data, thus completing the selection of training data.

[0162] 506. After obtaining M training data points, these M training data points can be used as the current batch of training data and input into the model to be trained to obtain the corresponding processing results. Based on these results, the parameters of the model to be trained are updated, resulting in the updated model. Then, the next batch of training data (which originates from the first dataset and contains M training data points that correspond one-to-one with the M training data points in the current batch) is used to continue training the updated model until the model training conditions are met, thus obtaining the third model (i.e., the trained neural network model, for example, ...). Figure 4 (Prediction model in the text).

[0163] Furthermore, the methods provided in the embodiments of this application and those provided by related technologies can be compared. Assuming there are 200 candidate feature data points, the goal is to select approximately 60 feature data points to reduce model training latency and improve the accuracy of the trained model. The comparison results are shown in Table 1.

[0164] Table 1

[0165]

[0166] As shown in Table 1, the feature data evaluation, feature data screening, and subsequent model training completed by the method provided in the embodiments of this application have better results.

[0167] Furthermore, the method provided in this application embodiment can be applied to various business scenarios, and the improvement in effect is shown in Table 2:

[0168] Table 2

[0169]

[0170] In this embodiment, when evaluation is required for N training data points, N training data points, N sets of first gating parameters, and N sets of second gating parameters can be obtained firstly. Then, the training model can be trained based on the N training data points and the i-th set of first gating parameters to obtain the i-th first model, and the training model can be trained based on the N training data points and the i-th set of second gating parameters to obtain the i-th second model. Then, a series of processing steps can be performed on the i-th first model and the i-th second model to obtain the evaluation value of the i-th training data point. In this way, the final evaluation values for the N training data points can be obtained. In the aforementioned process, since the first gating parameters of the i-th group can be used to make the i-th first model dependent on the i-th training data, and the second gating parameters of the i-th group can be used to make the i-th second model independent of the i-th training data, the model training process considers not only the case where the i-th training data plays a significant role, but also the case where the i-th training data plays a minor role. The importance of the i-th training data is comprehensively evaluated based on the models (the i-th first model and the i-th second model) obtained from these two cases. This method considers a relatively comprehensive range of factors, so the final evaluation values of the N training data have high accuracy, thus enabling accurate selection of training data.

[0171] Furthermore, in this embodiment of the application, for the i-th training data, the upper limit (importance limit) of the evaluation value of the i-th training data and the lower limit (importance limit) of the evaluation value of the i-th training data can be accurately analyzed, thereby obtaining the evaluation value range (importance range) of the i-th training data. Based on this range, the evaluation value of the i-th training data can be given more accurately, and thus the selection of training data can be completed more accurately.

[0172] Furthermore, in the embodiments of this application, during the process of obtaining the evaluation value of the i-th training data, the lower limit of the performance of the model (the i-th first model and the i-th second model) can be guaranteed by dynamically controlling the learning rate. This is beneficial to further improve the accuracy of the evaluation value of the i-th training data, thereby completing the selection of training data more accurately.

[0173] The above is a detailed description of the training data evaluation method provided in the embodiments of this application. The data processing method provided in the embodiments of this application will be described below. Figure 8 A flowchart illustrating the data processing method provided in this application embodiment is shown below. Figure 8 As shown, the method includes:

[0174] 801. Obtain M target data.

[0175] In this embodiment, M target data to be processed can be obtained, and these M target data are... Figure 5In the illustrated embodiment, the M training data points selected in step 505 correspond one-to-one, meaning the types of the M target data points are similar to the types of the M training data points. For example, the first training data point is the user's gender, and the first target data point is also the user's gender, ..., the Mth training data point is the price of the project, and the Mth target data point is also the price of the project. In other words, the M target data points contain relevant information about the user and relevant information about several projects.

[0176] 802. The third model is used to process the M target data to obtain the processing results of the M target data.

[0177] After obtaining M target data points, the M target data points can be input into... Figure 5 The third model trained in step 506 of the illustrated embodiment is used to process M target data to obtain the processing results of the M target data. Continuing with the example above, let the third model be a prediction model, and the M target data contain user-related information and information about several items. After processing the M target data, the prediction model can predict the probability that these items will be recommended to the user.

[0178] The above is a detailed description of the training data evaluation method and data processing method provided in the embodiments of this application. The training data evaluation device and data processing device provided in the embodiments of this application will be described below. Figure 9 A schematic diagram of the training data evaluation device provided in the embodiments of this application is shown below. Figure 9 As shown, the device includes:

[0179] The acquisition module 901 is used to acquire N training data, N sets of first gating parameters and N sets of second gating parameters, where N≥2;

[0180] The first training module 902 is used to train the model to be trained based on N training data and the i-th set of first gating parameters to obtain the i-th first model. The i-th set of first gating parameters is used to make the i-th first model dependent on the i-th training data, i = 1, ..., N;

[0181] The second training module 903 is used to train the model to be trained based on N training data and the i-th set of second gating parameters to obtain the i-th second model. The i-th set of second gating parameters is used to make the i-th second model independent of the i-th training data.

[0182] Evaluation module 904 is used to obtain the evaluation value of the i-th training data based on the i-th first model and the i-th second model. The evaluation value of the i-th training data is used to indicate the importance of the i-th training data.

[0183] In this embodiment, when evaluation is required for N training data points, N training data points, N sets of first gating parameters, and N sets of second gating parameters can be obtained firstly. Then, the training model can be trained based on the N training data points and the i-th set of first gating parameters to obtain the i-th first model, and the training model can be trained based on the N training data points and the i-th set of second gating parameters to obtain the i-th second model. Then, a series of processing steps can be performed on the i-th first model and the i-th second model to obtain the evaluation value of the i-th training data point. In this way, the final evaluation values for the N training data points can be obtained. In the aforementioned process, since the first gating parameters of the i-th group can be used to make the i-th first model dependent on the i-th training data, and the second gating parameters of the i-th group can be used to make the i-th second model independent of the i-th training data, the model training process considers not only the case where the i-th training data plays a significant role, but also the case where the i-th training data plays a minor role. The importance of the i-th training data is comprehensively evaluated based on the models (the i-th first model and the i-th second model) obtained from these two cases. This method considers a relatively comprehensive range of factors, so the final evaluation values of the N training data have high accuracy, thus enabling accurate selection of training data.

[0184] In one possible implementation, the first training module is configured to: perform a first processing on N training data based on the i-th set of first gating parameters to obtain N training data after the i-th first processing, wherein the i-th set of first gating parameters includes N first gating parameters that correspond one-to-one with the N training data, the i-th first gating parameter is within a first value range, and the remaining first gating parameters are within a second value range; and train the model to be trained based on the N training data after the i-th first processing to obtain the i-th first model.

[0185] In one possible implementation, the second training module is used to: perform a second processing on N training data based on the i-th set of second gating parameters to obtain N training data after the i-th second processing, wherein the i-th set of second gating parameters includes N second gating parameters that correspond one-to-one with the N training data, the i-th second gating parameter is within a second value range, and the remaining second gating parameters are within a first value range; and train the model to be trained based on the N training data after the i-th second processing to obtain the i-th second model.

[0186] In one possible implementation, the first training module is used to: input N training data after the i-th first processing into the model to be trained to obtain the i-th first processing result; update the parameters of the model to be trained based on the i-th first processing result and the i-th first learning rate to obtain the i-th first model, wherein the i-th first learning rate is used to ensure that the difference between the performance of the i-th first model and the performance of the model to be trained is within a third value range.

[0187] In one possible implementation, the second training module is used to: input N training data after the i-th second processing into the model to be trained to obtain the i-th second processing result; update the parameters of the model to be trained based on the i-th second processing result and the i-th second learning rate to obtain the i-th second model, wherein the i-th second learning rate is used to ensure that the difference between the performance of the i-th second model and the performance of the model to be trained is within a third value range.

[0188] In one possible implementation, the evaluation module is used to: acquire N new training data points, each corresponding one-to-one with one of the existing training data points, where the existing training data points originate from a first dataset and the new training data points originate from a second dataset; process the N new training data points using the i-th first model to obtain the upper bound of the evaluation value of the i-th training data point; process the N new training data points using the i-th second model to obtain the lower bound of the evaluation value of the i-th training data point; and obtain the evaluation value of the i-th training data point based on the upper bound and the lower bound of the evaluation value of the i-th training data point.

[0189] In one possible implementation, the evaluation module is used to: input N new training data into the i-th first model to obtain the i-th third processing result; input the remaining new training data other than the i-th new training data and the i-th new training data after perturbation into the i-th first model to obtain the i-th fourth processing result; and obtain the upper limit of the evaluation value of the i-th training data based on the i-th third processing result and the i-th fourth processing result.

[0190] In one possible implementation, the evaluation module is used to: input N new training data into the i-th second model to obtain the i-th fifth processing result; input the remaining new training data other than the i-th new training data and the i-th new training data after perturbation into the i-th second model to obtain the i-th sixth processing result; and obtain the lower limit of the evaluation value of the i-th training data based on the i-th fifth processing result and the i-th sixth processing result.

[0191] In one possible implementation, the device further includes: a selection module for selecting M training data from the N training data based on the evaluation values of the N training data, where N≥M≥1; and a third training module for updating the parameters of the model to be trained based on the M training data until the model training conditions are met, thereby obtaining a third model.

[0192] Figure 10 A schematic diagram of the data processing apparatus provided in the embodiments of this application is shown below. Figure 10 As shown, the device includes:

[0193] The acquisition module 1001 is used to acquire M target data.

[0194] The processing module 1002 is used to process M target data through a third model to obtain the processing results of the M target data.

[0195] It should be noted that the information interaction and execution process between the modules / units of the above-mentioned device are based on the same concept as the method embodiment of this application, and the resulting technical effects are the same as those of the method embodiment of this application. For details, please refer to the description in the method embodiment shown above in the embodiment of this application, and it will not be repeated here.

[0196] This application also relates to an execution device. Figure 11 This is a schematic diagram of the execution device provided in an embodiment of this application. Figure 11 As shown, the execution device 1100 can specifically be a mobile phone, tablet, laptop, smart wearable device, server, etc., and is not limited here. Among them, the execution device 1100 may be deployed with... Figure 10 The data processing apparatus described in the corresponding embodiment is used to implement Figure 8 The corresponding embodiment describes the data processing function. Specifically, the execution device 1100 includes: a receiver 1101, a transmitter 1102, a processor 1103, and a memory 1104 (wherein the execution device 1100 may have one or more processors 1103). Figure 11 (Taking a processor as an example), processor 1103 may include application processor 11031 and communication processor 11032. In some embodiments of this application, receiver 1101, transmitter 1102, processor 1103 and memory 1104 may be connected via bus or other means.

[0197] Memory 1104 may include read-only memory and random access memory, and provides instructions and data to processor 1103. A portion of memory 1104 may also include non-volatile random access memory (NVRAM). Memory 1104 stores processor and operation instructions, executable modules, or data structures, or subsets thereof, or extended sets thereof, wherein the operation instructions may include various operation instructions for implementing various operations.

[0198] Processor 1103 controls the operation of the execution device. In specific applications, the various components of the execution device are coupled together through a bus system, which may include not only the data bus, but also power buses, control buses, and status signal buses. However, for clarity, all buses are referred to as the bus system in the diagram.

[0199] The methods disclosed in the embodiments of this application can be applied to or implemented by the processor 1103. The processor 1103 can be an integrated circuit chip with signal processing capabilities. During implementation, each step of the above method can be completed by the integrated logic circuitry in the hardware of the processor 1103 or by instructions in software form. The processor 1103 can be a general-purpose processor, a digital signal processor (DSP), a microprocessor, or a microcontroller, and may further include an application-specific integrated circuit (ASIC), a field-programmable gate array (FPGA), or other programmable logic devices, discrete gate or transistor logic devices, or discrete hardware components. The processor 1103 can implement or execute the methods, steps, and logic block diagrams disclosed in the embodiments of this application. The general-purpose processor can be a microprocessor or any conventional processor. The steps of the methods disclosed in the embodiments of this application can be directly embodied in the execution of a hardware decoding processor, or executed by a combination of hardware and software modules in the decoding processor. The software module can reside in a mature storage medium in the art, such as random access memory, flash memory, read-only memory, programmable read-only memory, electrically erasable programmable memory, or registers. This storage medium is located in memory 1104. Processor 1103 reads the information in memory 1104 and, in conjunction with its hardware, completes the steps of the above method.

[0200] Receiver 1101 can be used to receive input digital or character information, and to generate signal inputs related to the settings and function control of the execution device. Transmitter 1102 can be used to output digital or character information through the first interface; transmitter 1102 can also be used to send instructions to the disk group through the first interface to modify the data in the disk group; transmitter 1102 may also include a display device such as a display screen.

[0201] In one embodiment of this application, the processor 1103 is used to... Figure 8 The third model in the corresponding embodiment obtains the processing result of the target data.

[0202] This application also relates to a training device. Figure 12 This is a schematic diagram of the structure of a training device provided in an embodiment of this application. Figure 12 As shown, the training device 1200 is implemented by one or more servers. The training device 1200 can vary significantly due to different configurations or performance. It may include one or more central processing units (CPUs) 1212 (e.g., one or more processors) and memory 1232, and one or more storage media 1230 (e.g., one or more mass storage devices) for storing application programs 1242 or data 1244. The memory 1232 and storage media 1230 can be temporary or persistent storage. The program stored in the storage media 1230 may include one or more modules (not shown in the figure), each module including a series of instruction operations on the training device. Furthermore, the CPU 1212 may be configured to communicate with the storage media 1230 and execute the series of instruction operations in the storage media 1230 on the training device 1200.

[0203] The training device 1200 may also include one or more power supplies 1226, one or more wired or wireless network interfaces 1250, one or more input / output interfaces 1258; or, one or more operating systems 1241, such as Windows Server™, Mac OS X™, Unix™, Linux™, FreeBSD™, etc.

[0204] Specifically, the training equipment can perform Figure 5 The training data evaluation method in the corresponding embodiment is used to complete the evaluation of the training data, thereby completing the screening of the training data and training the third model.

[0205] This application also relates to a computer storage medium storing a program for signal processing, which, when run on a computer, causes the computer to perform steps as performed by the aforementioned execution device, or causes the computer to perform steps as performed by the aforementioned training device.

[0206] This application also relates to a computer program product that stores instructions that, when executed by a computer, cause the computer to perform steps as performed by the aforementioned execution device, or to perform steps as performed by the aforementioned training device.

[0207] The execution device, training device, or terminal device provided in this application embodiment can specifically be a chip. The chip includes a processing unit and a communication unit. The processing unit can be, for example, a processor, and the communication unit can be, for example, an input / output interface, pins, or circuits. The processing unit can execute computer execution instructions stored in the storage unit to cause the chip within the execution device to execute the data processing method described in the above embodiments, or to cause the chip within the training device to execute the data processing method described in the above embodiments. Optionally, the storage unit can be a storage unit within the chip, such as a register or cache. Alternatively, the storage unit can be a storage unit located outside the chip within the wireless access device, such as a read-only memory (ROM) or other types of static storage devices capable of storing static information and instructions, such as random access memory (RAM).

[0208] For details, please refer to Figure 13 , Figure 13 This is a schematic diagram of the chip provided in an embodiment of this application. The chip can be represented as a neural network processor (NPU) 1300. The NPU 1300 is mounted as a coprocessor on the host CPU, and tasks are assigned by the host CPU. The core part of the NPU is the arithmetic circuit 1303, which is controlled by the controller 1304 to extract matrix data from the memory and perform multiplication operations.

[0209] In some implementations, the arithmetic circuit 1303 internally includes multiple processing engines (PEs). In some implementations, the arithmetic circuit 1303 is a two-dimensional pulsating array. The arithmetic circuit 1303 can also be a one-dimensional pulsating array or other electronic circuits capable of performing mathematical operations such as multiplication and addition. In some implementations, the arithmetic circuit 1303 is a general-purpose matrix processor.

[0210] For example, suppose we have an input matrix A, a weight matrix B, and an output matrix C. The arithmetic circuit retrieves the corresponding data of matrix B from the weight memory 1302 and caches it in each PE of the arithmetic circuit. The arithmetic circuit retrieves the data of matrix A from the input memory 1301 and performs matrix operations with matrix B. The partial result or the final result of the obtained matrix is stored in the accumulator 1308.

[0211] Unified memory 1306 is used to store input and output data. Weight data is directly transferred to weight memory 1302 via Direct Memory Access Controller (DMAC) 1305. Input data is also transferred to unified memory 1306 via DMAC.

[0212] BIU stands for Bus Interface Unit, which is used for interaction between the AXI bus and the DMAC and the Instruction Fetch Buffer (IFB) 1309.

[0213] The Bus Interface Unit (BIU) 1313 is used by the instruction fetch memory 1309 to fetch instructions from external memory, and also by the memory access controller 1305 to fetch the original data of the input matrix A or the weight matrix B from external memory.

[0214] The DMAC is mainly used to move input data from external memory DDR to unified memory 1306, or to weight data to weight memory 1302, or to input data to input memory 1301.

[0215] The vector computation unit 1307 includes multiple processing units that, when needed, further process the output of the computation circuit 1303, such as vector multiplication, vector addition, exponential operations, logarithmic operations, size comparisons, etc. It is mainly used for computation in non-convolutional / fully connected layers of neural networks, such as Batch Normalization, pixel-level summation, and upsampling of the predicted label plane.

[0216] In some implementations, the vector computation unit 1307 can store the processed output vector in the unified memory 1306. For example, the vector computation unit 1307 can apply a linear function, or a nonlinear function, to the output of the computation circuit 1303, such as linearly interpolating the predicted label plane extracted from the convolutional layer, or, for example, accumulating a vector of values to generate activation values. In some implementations, the vector computation unit 1307 generates normalized values, pixel-level summed values, or both. In some implementations, the processed output vector can be used as activation input to the computation circuit 1303, for example, for use in subsequent layers of the neural network.

[0217] The instruction fetch buffer 1309 connected to the controller 1304 is used to store the instructions used by the controller 1304;

[0218] Unified memory 1306, input memory 1301, weighted memory 1302, and instruction fetch memory 1309 are all on-chip memories. External memory is proprietary to this NPU hardware architecture.

[0219] The processor mentioned above can be a general-purpose central processing unit, a microprocessor, an ASIC, or one or more integrated circuits used to control the execution of the above program.

[0220] It should also be noted that the device embodiments described above are merely illustrative. The units described as separate components may or may not be physically separate, and the components shown as units may or may not be physical units; that is, they may be located in one place or distributed across multiple network units. Some or all of the modules can be selected to achieve the purpose of this embodiment according to actual needs. In addition, in the device embodiment drawings provided in this application, the connection relationship between modules indicates that they have a communication connection, which can be implemented as one or more communication buses or signal lines.

[0221] Through the above description of the embodiments, those skilled in the art can clearly understand that this application can be implemented by means of software plus necessary general-purpose hardware, or it can be implemented by special-purpose hardware including application-specific integrated circuits, special-purpose CPUs, special-purpose memory, special-purpose components, etc. Generally, any function performed by a computer program can be easily implemented by corresponding hardware, and the specific hardware structure used to implement the same function can also be diverse, such as analog circuits, digital circuits, or special-purpose circuits. However, for this application, software program implementation is more often the preferred implementation method. Based on this understanding, the technical solution of this application, in essence, or the part that contributes to the prior art, can be embodied in the form of a software product. This computer software product is stored in a readable storage medium, such as a computer floppy disk, USB flash drive, mobile hard disk, ROM, RAM, magnetic disk, or optical disk, etc., and includes several instructions to cause a computer device (which may be a personal computer, training equipment, or network device, etc.) to execute the methods described in the various embodiments of this application.

[0222] In the above embodiments, implementation can be achieved, in whole or in part, through software, hardware, firmware, or any combination thereof. When implemented in software, it can be implemented, in whole or in part, as a computer program product.

[0223] The computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on a computer, all or part of the processes or functions described in the embodiments of this application are generated. The computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable device. The computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another. For example, the computer instructions may be transmitted from one website, computer, training device, or data center to another website, computer, training device, or data center via wired (e.g., coaxial cable, fiber optic, digital subscriber line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.) means. The computer-readable storage medium may be any available medium that a computer can store or a data storage device such as a training device or data center that integrates one or more available media. The available media may be magnetic media (e.g., floppy disks, hard disks, magnetic tapes), optical media (e.g., DVDs), or semiconductor media (e.g., solid-state drives (SSDs)).

Claims

1. A training data evaluation method, characterized by, The method includes: Obtain N training data points, N sets of first gating parameters, and N sets of second gating parameters, where N ≥ 2; The training model is trained based on the N training data and the i-th set of first gating parameters to obtain the i-th first model. The i-th set of first gating parameters is used to make the i-th first model dependent on the i-th training data, i=1,...,N; The model to be trained is trained based on the N training data and the i-th set of second gating parameters to obtain the i-th second model. The i-th set of second gating parameters is used to make the i-th second model independent of the i-th training data. Based on the i-th first model and the i-th second model, an evaluation value for the i-th training data is obtained, and the evaluation value for the i-th training data is used to indicate the importance of the i-th training data; The process of obtaining the evaluation value of the i-th training data based on the i-th first model and the i-th second model includes: Obtain N new training data points, each corresponding one-to-one with the N training data points. The N training data points originate from the first dataset, and the N new training data points originate from the second dataset. The i-th first model is used to process the N new training data to obtain the upper limit of the evaluation value of the i-th training data; The i-th second model is used to process the N new training data to obtain the lower bound of the evaluation value of the i-th training data; The evaluation value of the i-th training data is obtained based on the upper limit of the evaluation value and the lower limit of the evaluation value of the i-th training data.

2. The method of claim 1, wherein, The process of training the model to be trained based on the N training data and the i-th group of first gating parameters to obtain the i-th first model includes: Based on the i-th group of first gate parameters, the N training data are processed for the i-th time to obtain the N training data after the i-th first processing. The i-th group of first gate parameters includes N first gate parameters that correspond one-to-one with the N training data. The i-th first gate parameter is located within a first value range, and the other first gate parameters are located within a second value range. Based on the N training data after the i-th first processing, the model to be trained is trained to obtain the i-th first model.

3. The method of claim 1, wherein, The process of training the model to be trained based on the N training data and the i-th set of second gating parameters to obtain the i-th second model includes: Based on the i-th group of second gating parameters, the N training data are subjected to the i-th second processing to obtain the N training data after the i-th second processing. The i-th group of second gating parameters includes N second gating parameters that correspond one-to-one with the N training data. The i-th second gating parameter is located in the second value range, and the other second gating parameters are located in the first value range. Based on the N training data after the i-th second processing, the model to be trained is trained to obtain the i-th second model.

4. The method of claim 2, wherein, The step of training the model to be trained based on the N training data after the i-th first processing to obtain the i-th first model includes: The N training data after the i-th first processing are input into the model to be trained to obtain the i-th first processing result; Based on the i-th first processing result and the i-th first learning rate, the parameters of the model to be trained are updated to obtain the i-th first model. The i-th first learning rate is used to ensure that the difference between the performance of the i-th first model and the performance of the model to be trained is within a third value range.

5. The method of claim 3, wherein, The step of training the model to be trained based on the N training data after the i-th second processing to obtain the i-th second model includes: The N training data after the i-th second processing are input into the model to be trained to obtain the i-th second processing result; Based on the i-th second processing result and the i-th second learning rate, the parameters of the model to be trained are updated to obtain the i-th second model. The i-th second learning rate is used to ensure that the difference between the performance of the i-th second model and the performance of the model to be trained is within a third value range.

6. The method of claim 1, wherein, The process of processing the N new training data using the i-th first model to obtain the upper limit of the evaluation value of the i-th training data includes: The N new training data are input into the i-th first model to obtain the i-th third processing result; The remaining new training data, excluding the i-th new training data, and the i-th new training data after perturbation are input into the i-th first model to obtain the i-th fourth processing result; Based on the i-th third processing result and the i-th fourth processing result, the upper limit of the evaluation value of the i-th training data is obtained.

7. The method of claim 6, wherein, The process of processing the N new training data using the i-th second model to obtain the lower bound of the evaluation value of the i-th training data includes: The N new training data are input into the i-th second model to obtain the i-th fifth processing result; The remaining new training data, excluding the i-th new training data, and the i-th new training data after perturbation are input into the i-th second model to obtain the i-th sixth processing result; Based on the i-th fifth processing result and the i-th sixth processing result, the lower limit of the evaluation value of the i-th training data is obtained.

8. The method according to any one of claims 1 to 7, characterized in that, The method further includes: Based on the evaluation values of the N training data, select M training data from the N training data, where N≥M≥1; Based on the M training data, the parameters of the model to be trained are updated until the model training conditions are met, thus obtaining the third model.

9. A training data evaluation apparatus characterized by comprising: The device includes: The acquisition module is used to acquire N training data, N sets of first gating parameters and N sets of second gating parameters, where N≥2; The first training module is used to train the model to be trained based on the N training data and the i-th set of first gating parameters to obtain the i-th first model. The i-th set of first gating parameters is used to make the i-th first model dependent on the i-th training data, i=1,...,N; The second training module is used to train the model to be trained based on the N training data and the i-th set of second gating parameters to obtain the i-th second model. The i-th set of second gating parameters is used to make the i-th second model independent of the i-th training data. An evaluation module is used to obtain an evaluation value for the i-th training data based on the i-th first model and the i-th second model, wherein the evaluation value of the i-th training data is used to indicate the importance of the i-th training data. The evaluation module is used for: Obtain N new training data points, each corresponding one-to-one with the N training data points. The N training data points originate from the first dataset, and the N new training data points originate from the second dataset. The i-th first model is used to process the N new training data to obtain the upper limit of the evaluation value of the i-th training data; The i-th second model is used to process the N new training data to obtain the lower bound of the evaluation value of the i-th training data; The evaluation value of the i-th training data is obtained based on the upper limit of the evaluation value and the lower limit of the evaluation value of the i-th training data.

10. The apparatus of claim 9, wherein, The first training module is used for: Based on the i-th group of first gate parameters, the N training data are processed for the i-th time to obtain the N training data after the i-th first processing. The i-th group of first gate parameters includes N first gate parameters that correspond one-to-one with the N training data. The i-th first gate parameter is located within a first value range, and the other first gate parameters are located within a second value range. Based on the N training data after the i-th first processing, the model to be trained is trained to obtain the i-th first model.

11. The apparatus according to claim 9, characterized in that, The second training module is used for: Based on the i-th group of second gating parameters, the N training data are subjected to the i-th second processing to obtain the N training data after the i-th second processing. The i-th group of second gating parameters includes N second gating parameters that correspond one-to-one with the N training data. The i-th second gating parameter is located in the second value range, and the other second gating parameters are located in the first value range. Based on the N training data after the i-th second processing, the model to be trained is trained to obtain the i-th second model.

12. The apparatus according to claim 10, characterized in that, The first training module is used for: The N training data after the i-th first processing are input into the model to be trained to obtain the i-th first processing result; Based on the i-th first processing result and the i-th first learning rate, the parameters of the model to be trained are updated to obtain the i-th first model. The i-th first learning rate is used to ensure that the difference between the performance of the i-th first model and the performance of the model to be trained is within a third value range.

13. The apparatus of claim 11, wherein, The second training module is used for: The N training data after the i-th second processing are input into the model to be trained to obtain the i-th second processing result; Based on the i-th second processing result and the i-th second learning rate, the parameters of the model to be trained are updated to obtain the i-th second model. The i-th second learning rate is used to ensure that the difference between the performance of the i-th second model and the performance of the model to be trained is within a third value range.

14. The apparatus of claim 9, wherein, The evaluation module is used for: The N new training data are input into the i-th first model to obtain the i-th third processing result; The remaining new training data, excluding the i-th new training data, and the i-th new training data after perturbation are input into the i-th first model to obtain the i-th fourth processing result; Based on the i-th third processing result and the i-th fourth processing result, the upper limit of the evaluation value of the i-th training data is obtained.

15. The apparatus of claim 14, wherein, The evaluation module is used for: The N new training data are input into the i-th second model to obtain the i-th fifth processing result; The remaining new training data, excluding the i-th new training data, and the i-th new training data after perturbation are input into the i-th second model to obtain the i-th sixth processing result; Based on the i-th fifth processing result and the i-th sixth processing result, the lower limit of the evaluation value of the i-th training data is obtained.

16. The apparatus of any one of claims 9 to 15, wherein, The device further includes: The selection module is used to select M training data from the N training data based on the evaluation values of the N training data, where N≥M≥1; The third training module is used to update the parameters of the model to be trained based on the M training data until the model training conditions are met, and thus obtain the third model.

17. A training data evaluation apparatus characterized by comprising: The device includes a memory and a processor; the memory stores code, and the processor is configured to execute the code, wherein when the code is executed, the training data evaluation device performs the method as described in any one of claims 1 to 8.

18. A computer storage medium, comprising, The computer storage medium stores one or more instructions that, when executed by one or more computers, cause the one or more computers to perform the method of any one of claims 1 to 8.

19. A computer program product, characterised in that, The computer program product stores instructions that, when executed by a computer, cause the computer to perform the method described in any one of claims 1 to 8.

Citation Information

Patent Citations

Method and device for predicting antibiotic resistance phenotype based on machine learning and application
CN114388062A
Federal learning-based data value evaluation method and related equipment thereof
CN115238909A

Patent Information

AI Technical Summary

Abstract

Description

Patent Citations

Method and device for predicting antibiotic resistance phenotype based on machine learning and application

Federal learning-based data value evaluation method and related equipment thereof