Model data processing method and device, computer device, readable storage medium and program product
By identifying the feature distance relationship between teacher and student models, determining the sample distribution loss parameter, and optimizing the student model, the problem of low knowledge transfer efficiency in sample data distribution in existing technologies is solved, and the processing efficiency of image recognition is improved.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- TENCENT TECHNOLOGY (SHENZHEN) CO LTD
- Filing Date
- 2024-12-18
- Publication Date
- 2026-06-19
AI Technical Summary
Existing knowledge distillation methods fail to effectively transfer model knowledge of sample data distribution during image recognition, resulting in low computer execution efficiency.
By identifying the feature distance relationship between the sample features extracted by the teacher model and the student model respectively, the sample distribution loss parameter of the relative distance of the samples is determined, and the student model is optimized based on this, constraining the feature distribution of the student model to be consistent with that of the teacher model.
It improves the processing efficiency of the model knowledge distillation process and enhances the overall efficiency of the image recognition process, making it particularly suitable for feature ranking tasks.
Smart Images

Figure CN122244484A_ABST
Abstract
Description
Technical Field
[0001] This application relates to the field of computer technology, and in particular to a model data processing method, apparatus, computer equipment, computer-readable storage medium, and computer program product. Background Technology
[0002] With the development of computer technology and artificial intelligence, model knowledge distillation has emerged. Model knowledge distillation is a commonly used technique in tasks such as model parameter compression, continuous learning, and unsupervised learning. It transfers knowledge from the teacher model to the student model by ensuring that the student model's output matches the teacher model's output. Current knowledge distillation methods typically constrain the classification results of the student and teacher models to be consistent, or reduce the L1 distance of feature vectors, to achieve knowledge transfer.
[0003] For example, for image recognition ranking models, current knowledge distillation methods only consider ensuring that the output of a single sample from the student model is consistent with that of the teacher model to transfer knowledge, ignoring the distance relationship between samples. This makes it impossible to transfer model knowledge about the distribution of sample data, resulting in low knowledge transfer efficiency and poor model knowledge distillation performance. This affects the processing efficiency of the computer in performing the model knowledge distillation process, and consequently reduces the computer's execution efficiency in the image recognition process. Summary of the Invention
[0004] Therefore, it is necessary to provide a model data processing method, apparatus, computer equipment, computer-readable storage medium, and computer program product that can improve the processing efficiency of computers in the image recognition process, in order to address the above-mentioned technical problems.
[0005] Firstly, this application provides a model data processing method, including:
[0006] Based on the teacher model and the student model, image feature extraction processing is performed on the input sample image set to obtain a first sample feature dataset corresponding to the teacher model and a second sample feature dataset corresponding to the student model.
[0007] Identify the first feature distance relationship between image features in the first sample feature dataset and the second feature distance relationship between image features in the second sample feature dataset, respectively.
[0008] Based on the first feature distance relationship and the second feature distance relationship, determine the sample distribution loss parameter of the relative distance of samples;
[0009] The student model is optimized based on the sample distribution loss parameters to obtain a target image recognition model, which is used to recognize and process the input image.
[0010] Secondly, this application also provides a model data processing apparatus, comprising:
[0011] The feature extraction module is used to perform image feature extraction processing on the input sample image set based on the teacher model and the student model respectively, to obtain a first sample feature dataset corresponding to the teacher model and a second sample feature dataset corresponding to the student model;
[0012] The distance recognition module is used to identify the first feature distance relationship between image features in the first sample feature dataset and the second feature distance relationship between image features in the second sample feature dataset, respectively.
[0013] The loss calculation module is used to determine the sample distribution loss parameters of the relative distance between samples based on the first feature distance relationship and the second feature distance relationship;
[0014] The model optimization module is used to optimize the student model based on the sample distribution loss parameters to obtain the target image recognition model, which is used to perform recognition processing on the input image.
[0015] Thirdly, this application also provides a computer device, including a memory and a processor, wherein the memory stores a computer program, and the processor executes the computer program to perform the following steps:
[0016] Based on the teacher model and the student model, image feature extraction processing is performed on the input sample image set to obtain a first sample feature dataset corresponding to the teacher model and a second sample feature dataset corresponding to the student model.
[0017] Identify the first feature distance relationship between image features in the first sample feature dataset and the second feature distance relationship between image features in the second sample feature dataset, respectively.
[0018] Based on the first feature distance relationship and the second feature distance relationship, determine the sample distribution loss parameter of the relative distance of samples;
[0019] The student model is optimized based on the sample distribution loss parameters to obtain a target image recognition model, which is used to recognize and process the input image.
[0020] Fourthly, this application also provides a computer-readable storage medium having a computer program stored thereon, which, when executed by a processor, performs the following steps:
[0021] Based on the teacher model and the student model, image feature extraction processing is performed on the input sample image set to obtain a first sample feature dataset corresponding to the teacher model and a second sample feature dataset corresponding to the student model.
[0022] Identify the first feature distance relationship between image features in the first sample feature dataset and the second feature distance relationship between image features in the second sample feature dataset, respectively.
[0023] Based on the first feature distance relationship and the second feature distance relationship, determine the sample distribution loss parameter of the relative distance of samples;
[0024] The student model is optimized based on the sample distribution loss parameters to obtain a target image recognition model, which is used to recognize and process the input image.
[0025] Fifthly, this application also provides a computer program product, including a computer program that, when executed by a processor, performs the following steps:
[0026] Based on the teacher model and the student model, image feature extraction processing is performed on the input sample image set to obtain a first sample feature dataset corresponding to the teacher model and a second sample feature dataset corresponding to the student model.
[0027] Identify the first feature distance relationship between image features in the first sample feature dataset and the second feature distance relationship between image features in the second sample feature dataset, respectively.
[0028] Based on the first feature distance relationship and the second feature distance relationship, determine the sample distribution loss parameter of the relative distance of samples;
[0029] The student model is optimized based on the sample distribution loss parameters to obtain a target image recognition model, which is used to recognize and process the input image.
[0030] The aforementioned model data processing method, apparatus, computer equipment, computer-readable storage medium, and computer program product first perform image feature extraction processing on the input sample image set based on the teacher model and the student model respectively, to obtain a first sample feature dataset corresponding to the teacher model and a second sample feature dataset corresponding to the student model, thus obtaining the basic data for the model distillation process. The first feature distance relationship between image features in the first sample feature dataset and the second feature distance relationship between image features in the second sample feature dataset are identified respectively. Based on the first and second feature distance relationships, a sample distribution loss parameter for the relative distance of samples is determined. This parameter constrains the distribution of sample features extracted by the student model to maintain structural consistency with the distribution of sample features extracted by the teacher model. Finally, the student model is optimized based on the sample distribution loss parameter to obtain a target image recognition model. This model is used to recognize and process the input image, efficiently transferring knowledge from the teacher model to the student model. This application extracts sample features through a teacher model and a student model, and obtains sample distribution loss parameters based on the distance relationship between sample features. By constraining the knowledge distillation process through the sample distribution loss parameters, the student model learns the feature distribution structure of the teacher model, thereby more efficiently transferring the knowledge of the teacher model, improving the processing efficiency of the computer in executing the model knowledge distillation process, and thus improving the efficiency of the entire image recognition processing process. Attached Figure Description
[0031] To more clearly illustrate the technical solutions in the embodiments of this application or related technologies, the drawings used in the description of the embodiments of this application or related technologies will be briefly introduced below. Obviously, the drawings described below are only some embodiments of this application. For those skilled in the art, other related drawings can be obtained based on these drawings without creative effort.
[0032] Figure 1 This is an application environment diagram of the model data processing method in one embodiment;
[0033] Figure 2 This is a flowchart illustrating a model data processing method in one embodiment;
[0034] Figure 3 This is a flowchart illustrating the student model optimization steps in one embodiment;
[0035] Figure 4 This is a flowchart illustrating the student model optimization steps in another embodiment;
[0036] Figure 5 This is a flowchart illustrating an in-vehicle image method in one embodiment;
[0037] Figure 6 This is a schematic diagram of the model iterative optimization training process in one embodiment;
[0038] Figure 7 This is a schematic diagram of the system architecture for model data processing in one embodiment;
[0039] Figure 8 This is a flowchart illustrating the model data processing method in another embodiment;
[0040] Figure 9 This is a structural block diagram of a model data processing device in one embodiment;
[0041] Figure 10 This is an internal structural diagram of a computer device in one embodiment. Detailed Implementation
[0042] To make the objectives, technical solutions, and advantages of this application clearer, the following detailed description is provided in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative and not intended to limit the scope of this application.
[0043] The model data processing method provided in this application embodiment can be applied to, for example... Figure 1In the application environment shown, terminal 102 communicates with server 104 via a network. A data storage system can store the data that server 104 needs to process. The data storage system can be integrated onto server 104 or placed on a cloud or other network server. When a user on terminal 102 wants to simplify the processing of a machine learning model, they can use knowledge distillation to simplify the structure of the existing teacher model through a student model. In this case, the user can use the method of this application to perform knowledge distillation processing of the model. The user can submit a corresponding processing request to server 104 through terminal 102, specifying the teacher model, student model, and sample dataset in the request. Server 104 performs image feature extraction processing on the input sample image set based on the teacher model and student model respectively, obtaining a first sample feature dataset corresponding to the teacher model and a second sample feature dataset corresponding to the student model; it identifies the first feature distance relationship between image features in the first sample feature dataset and the second feature distance relationship between image features in the second sample feature dataset; based on the first and second feature distance relationships, it determines the sample distribution loss parameter of the relative distance between samples; based on the sample distribution loss parameter, it optimizes the student model to obtain a target image recognition model, which is used to recognize the input image. The terminal 102 can be, but is not limited to, various personal computers, laptops, smartphones, tablets, IoT devices, and portable wearable devices. IoT devices can include smart speakers, smart TVs, smart air conditioners, smart in-vehicle systems, and projection devices. Portable wearable devices can include smartwatches, smart bracelets, and head-mounted displays. Head-mounted displays can be virtual reality (VR) devices, augmented reality (AR) devices, and smart glasses. The server 104 can be a standalone physical server, a server cluster or distributed system composed of multiple physical servers, or a cloud server providing cloud computing services.
[0044] In one exemplary embodiment, such as Figure 2 As shown, a model data processing method is provided, which can be applied to... Figure 1 Taking server 104 as an example, the explanation includes the following steps 201 to 207. Wherein:
[0045] Step 201: Based on the teacher model and the student model, perform image feature extraction processing on the input sample image set to obtain the first sample feature dataset corresponding to the teacher model and the second sample feature dataset corresponding to the student model.
[0046] In knowledge distillation, the teacher model and student model are two concepts. The teacher model is typically a well-trained, high-precision pre-trained model with a large number of parameters and a complex structure, capable of capturing rich features and knowledge. The student model is usually a smaller model with fewer parameters and a simpler structure. It improves its performance by learning the knowledge from the teacher model. Knowledge distillation is a model learning method that uses a teacher model to provide supervision signals to train the student model. Typically, the student model's output is constrained to approach the teacher model's output, thereby transferring knowledge from the teacher model to the student model. The input sample image set is the collection of sample data used to train the model during the knowledge distillation process. The specific sample data in the dataset can be selected based on the model's application domain. For example, when applied to image classification, the input sample image set can be constructed based on historical image data. Feature extraction refers to the process of identifying and extracting representative and informative features from raw data in machine learning, pattern recognition, and image processing. These features are simplified representations of the raw data, retaining the most relevant aspects of the data, and can be numbers, binary values, or categories. This application specifically uses feature vectors to represent the feature data extracted from the input sample image set.
[0047] For example, after a user on terminal 102 has trained a machine learning model with a large amount of data and determined that the model has achieved high accuracy, they may find that the current machine learning model is too complex and that the model parameters affect processing efficiency. In this case, the user can consider simplifying the model through knowledge distillation to reduce model parameters and improve processing efficiency. To further improve the computer processing efficiency of the model distillation process and shorten processing time, the user can send a request to server 104 through terminal 102. Server 104 will then perform efficient model distillation. After determining the information of the teacher model and student model and obtaining the input sample dataset, server 104 will first use the teacher model to perform image feature extraction processing on the input sample image set to obtain a first sample feature dataset corresponding to the teacher model. Simultaneously, the student model will perform image feature extraction processing on the same input sample image set to obtain a second sample feature dataset corresponding to the student model. In one embodiment, this application is applied to the classification processing of human target images. In this case, the human target image dataset can be used as a training set. First, a massive human target image dataset is collected to pre-train the classification model to obtain the teacher model. Then, a portion of the human target image dataset is selected as the input sample image set, and a lightweight student model is constructed. The input sample image set is then input into the teacher model and the student model respectively for knowledge distillation.
[0048] Step 203: Identify the first feature distance relationship between image features in the first sample feature dataset and the second feature distance relationship between image features in the second sample feature dataset.
[0049] Feature distance, in this context, refers to a metric used to measure the similarity or difference between different samples. In machine learning and data mining, the calculation methods for feature distance are crucial because they can be used in tasks such as clustering, classification, and anomaly detection. Feature distance primarily measures the similarity or difference between data points. By calculating the feature distance between different samples, their degree of similarity can be determined, thus enabling applications in various data analysis tasks. In the scheme of this application, feature distance is used to represent the relative distance between samples.
[0050] For example, the applicant found that existing model knowledge distillation methods only consider feature preservation for a single sample or the preservation of distance relationships between two or three samples, failing to adequately consider distance relationships between more samples. This results in low efficiency in transferring knowledge of inter-sample distance relationships and makes them unsuitable for tasks requiring feature ranking, such as image retrieval ranking and image-text search. Therefore, this application proposes a model data processing method to transfer knowledge of sample distribution relationships from the teacher model to the student model. To transfer these relationships, it is first necessary to identify the sample distribution relationships of the features extracted by the teacher and student models, i.e., to identify the first feature distance relationship between image features in the first sample feature dataset and the second feature distance relationship between image features in the second sample feature dataset. Specifically, for each image feature data in the sample feature dataset, its relative distance to other feature data in the dataset can be identified to obtain the first feature distance relationship. In the specific implementation, the K-nearest neighbor algorithm can be used to determine the feature distance relationship.
[0051] Step 205: Based on the first feature distance relationship and the second feature distance relationship, determine the sample distribution loss parameter of the relative distance of samples;
[0052] Step 207: Optimize the student model based on the sample distribution loss parameter to obtain the target image recognition model, which is used to recognize and process the input image.
[0053] For example, after obtaining the first and second feature distance relationships, this application further proposes that the sample distance relationships obtained by the student model should be kept consistent with those obtained by the teacher model. To constrain the feature distance relationships, a sample distribution loss parameter based on the relative distance of samples can be constructed based on the first and second feature distance relationships. In a specific embodiment, the sample distribution loss parameter can be calculated using the cross-entropy loss parameter. Finally, the student model needs to be optimized based on the sample distribution loss parameter to obtain the target image recognition model. For the optimization process of the student model, in addition to selecting the sample distribution loss parameter, classification loss functions, metric learning loss functions, and other auxiliary model training methods can be used according to the model type and application scenario to ensure model performance. Once the target image recognition model is obtained, image recognition processing can be achieved through the target image recognition model, ensuring processing efficiency. In one embodiment, the student model optimization process can refer to... Figure 3 As shown, by inputting image sample datasets into the student and teacher models respectively, image features are extracted, and the distance relationships between features are identified. Then, the distance relationships between features in the student and teacher models are compared, thereby achieving knowledge distillation from the perspective of sample distribution structure and optimizing the student model. Furthermore, this training process can be iterated multiple times to continuously optimize the student model.
[0054] The aforementioned model data processing method first performs image feature extraction processing on the input sample image set based on the teacher model and the student model respectively, to obtain a first sample feature dataset corresponding to the teacher model and a second sample feature dataset corresponding to the student model. This provides the basic data for the model distillation process. The method then identifies the first feature distance relationship between image features in the first sample feature dataset and the second feature distance relationship between image features in the second sample feature dataset. Based on the first and second feature distance relationships, a sample distribution loss parameter for relative sample distance is determined. This parameter ensures that the distribution of sample features extracted by the student model maintains structural consistency with the distribution of sample features extracted by the teacher model. Finally, the student model is optimized based on the sample distribution loss parameter to obtain the target image recognition model. This target image recognition model is then used to recognize and process the input image, efficiently transferring knowledge from the teacher model to the student model. This application extracts sample features through a teacher model and a student model, and obtains sample distribution loss parameters based on the distance relationship between sample features. By constraining the knowledge distillation process through the sample distribution loss parameters, the student model learns the feature distribution structure of the teacher model, thereby more efficiently transferring the knowledge of the teacher model, improving the processing efficiency of the computer in executing the model knowledge distillation process, and thus improving the efficiency of the entire image recognition processing process.
[0055] In an exemplary embodiment, identifying the first feature distance relationship between image features in the first sample feature dataset includes: constructing a K nearest neighbor sample set for each image feature data in the first sample feature dataset; determining the feature similarity between each image feature data and the corresponding sample feature data in the K nearest neighbor sample set; determining the first feature distance relationship corresponding to the image feature data based on the feature similarity; and summarizing the first feature distance relationships of all image feature data to obtain the first feature distance relationship between image features in the first sample feature dataset.
[0056] In this context, the K-nearest neighbor (KNN) samples are those obtained through classification using the KNN algorithm. The KNN algorithm, given a training dataset, finds the K nearest neighbors to a new input instance in the training dataset. If the majority of these K neighbors belong to a certain class, the input instance is classified into that class. For each sample feature data in the first sample feature dataset, the KNN algorithm can find its K nearest neighbors. Feature similarity is a metric that measures the degree of similarity between different features. In machine learning, the calculation method of feature similarity and the weighting of features have a significant impact on the algorithm's performance. The calculation of feature similarity is the core of the KNN algorithm; it determines the distance between sample data points, thus affecting the accuracy of predictions.
[0057] For example, this application can specifically use the K-nearest neighbor algorithm to classify features in a sample feature dataset, thereby identifying the first feature distance relationship between features. When applying the K-nearest neighbor algorithm, it is first necessary to construct a set of K-nearest neighbor samples for each image feature data in the first sample feature dataset. For example, using... This represents the first sample feature dataset corresponding to the teacher model, i.e., the features extracted for each teacher model. Select the K nearest neighbor sample set in the teacher model features and record it as... ,in This indicates that among the features extracted by the teacher model, according to the features... Sort the distances in ascending order, with the sequence number as follows: The features are then determined. Next, the feature similarity between each image feature data and the feature data of the corresponding K nearest neighbor sample set is determined, and then the features are calculated respectively. The similarity between each feature in its K nearest neighbors set. Specifically, for China does not belong to The characteristics of K-nearest neighbors, and their relationship with The similarity between samples with large distances is recorded as 0. This is because changes in the relative distance between sample pairs do not affect model performance. Features The similarity to itself is recorded as 1. Then, based on feature similarity, the first feature distance relationship corresponding to the image feature data is determined. That is, the distance relationship between samples is measured by the relative similarity between features. Finally, the first feature distance relationships of all feature data are summarized together to obtain the first feature distance relationship between image features in the first sample feature dataset. In this embodiment, the K-nearest neighbor algorithm is used to constrain the distance ratio between sample nearest neighbors calculated by the student model and the teacher model to be consistent. This can simultaneously optimize the distance relationship between more samples in the nearest neighbor region, and has a more comprehensive constraint on the distance relationship between samples, thereby improving the model's knowledge transfer efficiency and making it more suitable for feature ranking tasks.
[0058] Further, the process for determining the feature similarity between each image feature data and the feature data in the corresponding K nearest neighbor sample set includes: determining the feature vector of the image feature data and the feature vector of the sample feature data; and determining the feature similarity between each image feature data and the feature data in the corresponding K nearest neighbor sample set based on the dot product of the feature vectors of the image feature data and the feature vectors of the sample feature data. The dot product, also known as the inner product, is a mathematical operation between two vectors that multiplies corresponding elements of the two vectors and adds the results. The result of the dot product is a scalar (a single number) that can be used to measure the similarity between two vectors in the same direction. A larger dot product indicates greater similarity in the same direction; a zero dot product indicates perpendicularity; and a negative dot product indicates opposite directions. This application determines the similarity between vectors through the dot product method. First, it is necessary to determine the feature vector of the feature data and simultaneously determine the feature vector of the sample feature data in the K nearest neighbor sample set corresponding to the feature data. Then, the two vectors are multiplied by a dot product to determine the feature similarity between each image feature data and the feature data in the corresponding K nearest neighbor sample set. That is, the calculation process is the dot product of two feature vectors: Calculating feature similarity by using the dot product of eigenvectors can effectively improve the efficiency of feature computation.
[0059] Furthermore, the construction process of the K-nearest neighbor sample set specifically includes: obtaining the sample distribution data of the first sample feature dataset; determining the K-nearest neighbor value based on the sample distribution data; and constructing the K-nearest neighbor sample set for each image feature data in the first sample feature dataset based on the K-nearest neighbor value. The K-nearest neighbor value is a positive integer representing the number of nearest neighbor samples considered during prediction. The choice of K value directly affects the complexity of the model and the accuracy of the prediction results. A smaller K value increases the complexity of the model and may lead to overfitting; a larger K value simplifies the model and may lead to underfitting. In this application, the K-nearest neighbor value can be determined through the sample distribution of the feature dataset. After feature extraction, the value of K can be set according to different data distributions. After determining the K-nearest neighbor value, the K-nearest neighbor sample set for each image feature data in the first sample feature dataset is constructed based on the K-nearest neighbor value. That is, for each feature data, the feature corresponding to the number of K-nearest neighbor values is selected to form the K-nearest neighbor sample set for each image feature data. In this embodiment, selecting the K-nearest neighbor value based on the sample distribution data of the sample feature dataset and then constructing the corresponding K-nearest neighbor sample set can effectively ensure the efficiency and accuracy of sample construction.
[0060] In an exemplary embodiment, determining the first feature distance relationship corresponding to the image feature data based on feature similarity includes: summarizing the feature similarities in the K nearest neighbor sample set of the image feature data to obtain the total similarity data; and determining the first feature distance relationship corresponding to the image feature data based on the ratio of each feature similarity to the total similarity data.
[0061] For example, after determining feature similarity, the similarity between two features can be denoted as... Since the similarity is determined by the KNN relationship, Meanwhile, due to the first sample feature dataset China does not belong to The characteristics of K-nearest neighbors, and their relationship with The similarity between samples with large distances is recorded as 0. This is because changes in the relative distance between sample pairs do not affect model performance. Features The similarity to itself is denoted as 1. If feature j is in the KNN of feature i, that is... ,but ,on the contrary, After determining the feature similarity in the K nearest neighbor sample set of a feature data, all feature similarities can be summarized to obtain a total similarity data. Based on the ratio of each feature similarity to the total similarity data, the first feature distance relationship corresponding to the image feature data is determined. This process satisfies the formula:
[0062]
[0063] Here, exp(x) represents the natural exponential function, and β is a hyperparameter, typically set to 0.05. This transforms the distance between two features into a relative distance between samples, accommodating different sample distribution densities. Specifically, the calculation of the second feature distance relationship in the student model can also refer to the above formula, i.e., within the feature set extracted by the student model... Similarly, the K-nearest neighbor relation is used to characterize the sample distribution structure. For each student model, the extracted features... Select the set of K nearest neighbors of the student model features, denoted as: ,in This indicates that among the features extracted by the student model, according to the features Sort the distances in ascending order, with the sequence number as follows: The features are then used. Similarly, the relative similarity between features is calculated using the formula described above, denoted as... In this embodiment, relative similarity between features is used to measure the distance relationship between samples, thereby avoiding the influence of sparsity variations in feature distribution on the absolute value of feature similarity and effectively improving the accuracy of sample distribution loss calculation during knowledge distillation. This method of representing relative distance relationships proposed in this application can adapt to situations where the output feature scales of the student model and the teacher model are inconsistent, making the model training process more robust and stable. Furthermore, this constraint reduces computational complexity while ensuring global optimization.
[0064] In an exemplary embodiment, step 205 includes: obtaining the sample distribution loss parameter of the relative distance of samples by traversing the first feature distance and the second feature distance through the cross-entropy loss function.
[0065] For example, the cross-entropy loss function is a function that optimizes a machine learning model using cross-entropy. Cross-entropy is an important concept in information theory, measuring the difference between two different probability distributions of the same random variable; in machine learning, it is represented as the difference between the true probability distribution and the predicted probability distribution. This application proposes to constrain the inter-sample distance relationship obtained by the student model to be consistent with that obtained by the teacher model. Therefore, this application proposes to use cross-entropy to constrain the output of the student model, and the specific loss parameter calculation method satisfies the formula:
[0066]
[0067] The above formula iterates through all the distance relationships between samples. Although the number of pairwise distances between samples has a squared relationship with the number of samples, i.e., the number of pairs is... However, due to In this context, each sample only has non-zero relative similarity with its K nearest neighbors; therefore, the computational complexity of the above formula is O(n). Therefore, this application can constrain the global distance relationship between samples with relatively low computational complexity. In this embodiment, based on the K-nearest neighbor calculation constraint, the cross-entropy loss function is used to calculate the sample distribution loss parameter, thereby avoiding the problem that the computational complexity increases quadratically with the number of samples, effectively improving the processing efficiency of the loss calculation process, and thus accelerating the overall efficiency of the knowledge distillation process.
[0068] In an exemplary embodiment, the teacher model and the student model include classification models. Step 207 includes: determining the sample classification result of the student model based on the second sample feature dataset; determining the classification loss parameter of the student model based on the sample classification result; determining the knowledge distillation loss parameter based on the sample distribution loss parameter and the classification loss parameter; and optimizing the student model based on the knowledge distillation loss parameter to obtain the target image recognition model.
[0069] For example, in one embodiment, this application can be specifically applied to the optimization of classification models, where both the teacher model and the student model are classification models. In this case, a classification loss parameter can be introduced, combined with a sample distribution loss parameter, to calculate the loss. To calculate the classification loss parameter, the sample classification result of the student model needs to be determined based on a second sample feature dataset. After extracting the second sample feature dataset through the student model, a classifier can be used to classify each sample feature data in the second sample feature dataset, obtaining the corresponding classification probability as the sample classification result. By comparing the sample labels and the sample classification results, the classification loss parameter of the student model can be determined. In a specific embodiment, the classification loss parameter can be calculated using the cross-entropy loss function. After obtaining the classification loss parameter, the total knowledge distillation loss parameter can be determined based on the sample distribution loss parameter, the classification loss parameter, and the pre-assigned weight values for different loss parameters. The weight values for different loss parameters can be initially set according to the model's application domain and continuously optimized during training. Finally, the student model is optimized using the knowledge distillation loss parameter to obtain the target image recognition model. When applied to image classification processing, the complete knowledge distillation process can be referred to... Figure 4 The flowchart is shown. In this embodiment, combining the classification loss function and the sample distribution loss parameter to optimize the student model can effectively improve the knowledge distillation effect when applied to the classification model and improve the computer's processing efficiency during the knowledge distillation process.
[0070] In an exemplary embodiment, the target image recognition model includes an in-vehicle image recognition model. Step 209 includes: acquiring an in-vehicle image captured by an in-vehicle camera; inputting the in-vehicle image into the in-vehicle image recognition model to obtain the image recognition result output by the in-vehicle image recognition model; determining the obstacle category and obstacle location in the in-vehicle image based on the image recognition result; performing path planning processing based on the obstacle category and obstacle location to generate autonomous driving instructions.
[0071] For example, the solution of this application can be used for the recognition and processing of in-vehicle images during autonomous driving. Since in-vehicle image processing requires edge processing in conjunction with an in-vehicle computer, after pre-training the in-vehicle image recognition model with large-scale data, knowledge distillation can be used to simplify the model. After simplification using the above method, a lightweight in-vehicle image recognition model that can be deployed on an in-vehicle computer can be obtained. The in-vehicle computer collects images of objects around the vehicle during its driving process using the vehicle's onboard cameras, capturing images such as... Figure 5 The image shown allows for the identification of vehicles, pedestrians, obstacles, road signs, and other information surrounding the vehicle. The onboard image recognition model then identifies the types and locations of these obstacles and outputs this information. The onboard computer performs real-time path planning based on the obstacle types and locations identified by the model to generate autonomous driving commands that control the vehicle's movement, thereby dynamically controlling the vehicle's operation. In this embodiment, by efficiently implementing knowledge distillation of the onboard image recognition model, the deployment efficiency of the onboard image recognition model in the field of autonomous driving is effectively accelerated.
[0072] In an exemplary embodiment, the target image recognition model includes an image review model. Step 209 includes: acquiring an image to be reviewed; inputting the image to be reviewed into the image review model to obtain the image review result output by the image review model; constructing model training data containing the image to be reviewed and the review result; and optimizing the image review model based on the model training data.
[0073] For example, the model data processing method of this application can also be applied to the field of image review, for knowledge distillation processing of image review models. Since the types of non-compliant images are constantly updated over time during the image review process, the image review model also needs to continuously update its parameters to adapt to new review requirements. In this case, the model data processing method of this application can be used to achieve knowledge distillation and continuous optimization of the image review model. After completing the knowledge distillation of the image review model, it can be deployed to a platform for image review processing. The acquired images to be reviewed are input into the image review model, and the image review results output by the model are obtained. These images to be reviewed can then be used as the basic data for subsequent model optimization. Model training data is constructed using these images and the corresponding review results, and the image review model is then optimized using the model training data. The specific training process can be referred to... Figure 6 As shown, this embodiment can automatically optimize the image review model over a long period of time, enabling the model to continuously adapt to changes in data distribution as data increases, thereby continuously improving its recognition capabilities.
[0074] This application also provides an application scenario, which is illustrated by taking the above-mentioned model data processing method as an example. The model data processing method specifically includes:
[0075] When developers are developing a machine learning model related to image retrieval and ranking, they pre-collect massive amounts of image data as training data to pre-train the initial machine learning model, resulting in an image retrieval model suitable for image retrieval and ranking. However, developers find that this model has too many parameters, is complex, and the retrieval process is time-consuming after inputting an image, making it unsuitable for practical retrieval scenarios. Therefore, knowledge distillation is needed to simplify the model. However, current knowledge distillation methods only focus on preserving the features of individual samples, while image retrieval and ranking require ranking based on the distance between samples. Current knowledge distillation methods cannot guarantee distillation effectiveness. Therefore, the model data processing method described in this application can be used to process the image recognition model. The specific system architecture for the model data processing in this application can be found in [reference needed]. Figure 7 As shown, the process includes feature extraction, distance recognition, loss calculation, and model optimization.
[0076] First, a sample image set needs to be obtained as input. Then, the image retrieval model is used as the teacher model to construct the initial student model. Based on the student model constructed above... A teacher model This achieves a simple distillation process. For the above input samples, features are extracted using the student model and the teacher model, respectively, and denoted as follows: The subscripts of the features indicate their sequence numbers, and the superscripts indicate the model used. This indicates the number of samples. All features have been L2 normalized.
[0077] To convey the data distribution structure of the features extracted by the teacher models, this paper first addresses the features extracted by each teacher model. Select the set of K nearest neighbors of the teacher model features and record them as follows: ,in This indicates that among the features extracted by the teacher model, according to the features... Sort the distances in ascending order, with the sequence number as follows: The characteristics were then calculated separately. The similarity between features in its K nearest neighbors set is calculated by the dot product of the two feature vectors: .for China does not belong to The characteristics of K-nearest neighbors, and their relationship with The similarity is recorded as 0. This is because for sample pairs that are far apart, changes in their relative distance do not affect model performance. Features The similarity to itself is denoted as 1.
[0078] The similarity between two features is denoted as... Note that since the similarity is determined by the KNN relationship, therefore... According to the above method, if feature j is in the KNN of feature i, that is... ,but ,on the contrary, .
[0079] To avoid the influence of sparse feature distribution on the absolute value of feature similarity, this design uses relative feature similarity to measure the distance relationship between samples. The calculation method is as follows:
[0080]
[0081] The above formula converts the distance between two features into the relative distance between samples, which is compatible with different sample distribution densities.
[0082] Feature set extracted from student model Similarly, the K-nearest neighbor relationship is used to characterize the sample distribution structure. For each student model, the extracted features... Select the set of K nearest neighbors of the student model features, denoted as: ,in This indicates that among the features extracted by the student model, according to the features Sort the distances in ascending order, with the sequence number as follows: The features. Similarly, use Formula 1 to calculate the relative similarity between features, denoted as . .
[0083] and These represent the distance relationships between samples in the features extracted by the teacher model and the student model, respectively. This application proposes that the distance relationships between samples obtained by the student model should be consistent with those obtained by the teacher model. Therefore, this application proposes using cross-entropy to constrain the output of the student model:
[0084]
[0085] The above formula iterates through all the distance relationships between samples. Although the number of pairwise distances between samples has a squared relationship with the number of samples, i.e., the number of pairs is... However, due to In this context, each sample has a non-zero relative similarity only among its K nearest neighbors; therefore, the computational complexity described above is O(n). Therefore, this application can constrain the global distance relationship between samples with relatively small computational complexity.
[0086] After completing the knowledge distillation of the image retrieval model through the above processing, it is validated on the corresponding image retrieval task. Using the Market-1501 dataset, a retrieval model is first trained using the ResNet101 model, and then a ResNet50 model with fewer parameters is trained under supervision using this model. When training the student model, a classification constraint loss function based on cross-entropy is used in addition.
[0087] The ResNet50 model obtained using the teacher model supervision method proposed in this application achieves a retrieval evaluation accuracy (mAP) of 84.5%, while the mAP of the ResNet50 model obtained using existing single-feature-based supervision methods is 81.7%, and the mAP of the ResNet50 model without teacher model supervision is 80.2%. It can be seen that the model knowledge distillation method proposed in this application can train a model with higher accuracy than existing methods, indicating that the proposed method has higher knowledge transfer efficiency and is more suitable for feature ranking-related tasks.
[0088] In a specific embodiment, the flowchart of the code data protection method of this application can be referred to. Figure 8 As shown, it includes:
[0089] Step 801: Extract image features from the input sample image sets based on the teacher model and student model respectively, to obtain a first sample feature dataset corresponding to the teacher model and a second sample feature dataset corresponding to the student model. Step 803: Obtain the sample distribution data of the first sample feature dataset. Step 805: Determine the K-nearest neighbor value based on the sample distribution data. Step 807: Construct a K-nearest neighbor sample set for each image feature data in the first sample feature dataset based on the K-nearest neighbor value. Step 809: Determine the feature vectors of the image feature data and the sample feature data. Step 811: Determine the feature similarity between each image feature data and the feature data in the corresponding K-nearest neighbor sample set based on the dot product of the feature vectors of the image feature data and the feature vectors of the sample feature data. Step 813: Summarize the feature similarities in the K-nearest neighbor sample set of the image feature data to obtain the total similarity data. Step 815: Determine the first feature distance relationship corresponding to the image feature data based on the ratio of each feature similarity to the total similarity data. Step 817: Summarize the first feature distance relationships of all image feature data to obtain the first feature distance relationships between image features in the first sample feature dataset. Step 819: Determine the second feature distance relationships between image features in the second sample feature dataset. Step 821: Iterate through the first and second feature distances using the cross-entropy loss function to obtain the sample distribution loss parameter for the relative distance of samples. Step 823: Optimize the student model based on the sample distribution loss parameter to obtain the target image recognition model.
[0090] It should be understood that although the steps in the flowcharts of the above embodiments are shown sequentially according to the arrows, these steps are not necessarily executed in the order indicated by the arrows. Unless explicitly stated herein, there is no strict order restriction on the execution of these steps, and they can be executed in other orders. Moreover, at least some steps in the flowcharts of the above embodiments may include multiple steps or multiple stages. These steps or stages are not necessarily completed at the same time, but can be executed at different times. The execution order of these steps or stages is not necessarily sequential, but can be performed alternately or in turn with other steps or at least some of the steps or stages of other steps.
[0091] Based on the same inventive concept, this application also provides a model data processing apparatus for implementing the model data processing method described above. The solution provided by this apparatus is similar to the implementation scheme described in the above method; therefore, the specific limitations in one or more model data processing apparatus embodiments provided below can be found in the limitations of the model data processing method described above, and will not be repeated here.
[0092] In one exemplary embodiment, such as Figure 9 As shown, a model data processing device is provided, comprising:
[0093] The feature extraction module 902 is used to perform image feature extraction processing on the input sample image set based on the teacher model and the student model respectively, to obtain a first sample feature dataset corresponding to the teacher model and a second sample feature dataset corresponding to the student model.
[0094] The distance recognition module 904 is used to recognize the first feature distance relationship between image features in the first sample feature dataset and the second feature distance relationship between image features in the second sample feature dataset.
[0095] The loss calculation module 906 is used to determine the sample distribution loss parameters of the relative distance between samples based on the first feature distance relationship and the second feature distance relationship.
[0096] The model optimization module 908 is used to optimize the student model based on the sample distribution loss parameters to obtain the target image recognition model.
[0097] The aforementioned model data processing device first performs image feature extraction processing on the input sample image set based on the teacher model and the student model respectively, obtaining a first sample feature dataset corresponding to the teacher model and a second sample feature dataset corresponding to the student model. This provides the basic data for the model distillation process. The device then identifies the first feature distance relationship between image features in the first sample feature dataset and the second feature distance relationship between image features in the second sample feature dataset. Based on these first and second feature distance relationships, a sample distribution loss parameter for relative sample distance is determined. This parameter constrains the distribution of sample features extracted by the student model to maintain structural consistency with the distribution of sample features extracted by the teacher model. Finally, the student model is optimized based on the sample distribution loss parameter to obtain the target image recognition model, efficiently transferring knowledge from the teacher model to the student model. This application extracts sample features using both teacher and student models and obtains the sample distribution loss parameter based on the distance relationship between sample features. By constraining the knowledge distillation process using the sample distribution loss parameter, the student model learns the feature distribution structure of the teacher model, thereby more efficiently transferring knowledge from the teacher model and improving the processing efficiency of the computer in executing the model knowledge distillation process.
[0098] In one embodiment, the distance recognition module 904 is specifically used to: construct a set of K nearest neighbor samples for each image feature data in the first sample feature dataset; determine the feature similarity between each image feature data and the corresponding sample feature data in the set of K nearest neighbor samples; determine the first feature distance relationship corresponding to the image feature data based on the feature similarity; and summarize the first feature distance relationships of all image feature data to obtain the first feature distance relationship between image features in the first sample feature dataset.
[0099] In one embodiment, the distance recognition module 904 is specifically used to: determine the feature vector of the image feature data and the feature vector of the sample feature data; and determine the feature similarity between each image feature data and the feature data in the corresponding K nearest neighbor sample set based on the dot product result of the feature vector of the image feature data and the feature vector of the sample feature data.
[0100] In one embodiment, the distance recognition module 904 is specifically used to: summarize the feature similarity in the K nearest neighbor sample set of image feature data to obtain the total similarity data; and determine the first feature distance relationship corresponding to the image feature data based on the ratio of each feature similarity to the total similarity data.
[0101] In one embodiment, the distance recognition module 904 is specifically used to: obtain sample distribution data of the first sample feature dataset; determine the K nearest neighbor value based on the sample distribution data; and construct a set of K nearest neighbor samples for each image feature data in the first sample feature dataset based on the K nearest neighbor value.
[0102] In one embodiment, the loss calculation module 906 is specifically used to: traverse the first feature distance and the second feature distance through the cross-entropy loss function to obtain the sample distribution loss parameter of the relative distance of the samples.
[0103] In one embodiment, the teacher model and the student model include a classification model; the model optimization module 908 is specifically used to: determine the sample classification result of the student model based on the second sample feature dataset; determine the classification loss parameter of the student model based on the sample classification result; determine the knowledge distillation loss parameter based on the sample distribution loss parameter and the classification loss parameter; and optimize the student model based on the knowledge distillation loss parameter to obtain the target image recognition model.
[0104] In one embodiment, the target image recognition model includes an in-vehicle image recognition model. The device also includes a path planning module, configured to: acquire in-vehicle images captured by an in-vehicle camera; input the in-vehicle images into the in-vehicle image recognition model to obtain image recognition results output by the in-vehicle image recognition model; determine the obstacle category and obstacle location in the in-vehicle image based on the image recognition results; and perform path planning processing based on the obstacle category and obstacle location to generate autonomous driving instructions.
[0105] In one embodiment, the target image recognition model includes an image review model. The apparatus further includes an image review module, configured to: acquire an image to be reviewed; input the image to be reviewed into the image review model to acquire the image review result output by the image review model; construct model training data containing the image to be reviewed and the review result; and optimize the image review model based on the model training data.
[0106] Each module in the aforementioned model data processing device can be implemented entirely or partially through software, hardware, or a combination thereof. These modules can be embedded in or independent of the processor in a computer device, or stored in the memory of a computer device as software, so that the processor can call and execute the operations corresponding to each module.
[0107] In one exemplary embodiment, a computer device is provided, which may be a server, and its internal structure diagram may be as follows: Figure 10 As shown, this computer device includes a processor, memory, input / output (I / O) interfaces, and a communication interface. The processor, memory, and I / O interfaces are connected via a system bus, and the communication interface is also connected to the system bus via the I / O interfaces. The processor provides computational and control capabilities. The memory includes non-volatile storage media and internal memory. The non-volatile storage media stores the operating system, computer programs, and a database. The internal memory provides the environment for the operating system and computer programs stored in the non-volatile storage media. The database stores data related to model data processing. The I / O interfaces are used for exchanging information between the processor and external devices. The communication interface is used for communicating with external terminals via a network connection. When the computer program is executed by the processor, it implements a model data processing method.
[0108] Those skilled in the art will understand that Figure 10 The structure shown is merely a block diagram of a portion of the structure related to the present application and does not constitute a limitation on the computer device to which the present application is applied. Specific computer devices may include more or fewer components than those shown in the figure, or combine certain components, or have different component arrangements.
[0109] In one embodiment, a computer device is also provided, including a memory and a processor, wherein the memory stores a computer program, and the processor executes the computer program to implement the steps in the above method embodiments.
[0110] In one embodiment, a computer-readable storage medium is provided storing a computer program that, when executed by a processor, implements the steps in the above method embodiments.
[0111] In one embodiment, a computer program product or computer program is provided, the computer program product or computer program including computer instructions stored in a computer-readable storage medium. A processor of a computer device reads the computer instructions from the computer-readable storage medium, and executes the computer instructions, causing the computer device to perform the steps in the above method embodiments.
[0112] It should be noted that the user information (including but not limited to user device information, user personal information, etc.) and data (including but not limited to data used for analysis, data stored, data displayed, etc.) involved in this application are all information and data authorized by the user or fully authorized by all parties, and the collection, use and processing of the relevant data must comply with relevant regulations.
[0113] Those skilled in the art will understand that all or part of the processes in the methods of the above embodiments can be implemented by a computer program instructing related hardware. The computer program can be stored in a non-volatile computer-readable storage medium, and when executed, it can include the processes of the embodiments of the above methods. Any references to memory, databases, or other media used in the embodiments provided in this application can include at least one of non-volatile memory and volatile memory. Non-volatile memory can include read-only memory (ROM), magnetic tape, floppy disk, flash memory, optical memory, high-density embedded non-volatile memory, resistive random access memory (ReRAM), magnetic random access memory (MRAM), ferroelectric random access memory (FRAM), phase change memory (PCM), graphene memory, etc. Volatile memory can include random access memory (RAM) or external cache memory, etc. By way of illustration and not limitation, RAM can take many forms, such as Static Random Access Memory (SRAM) or Dynamic Random Access Memory (DRAM). The databases involved in the embodiments provided in this application may include at least one type of relational database and non-relational database. Non-relational databases may include, but are not limited to, blockchain-based distributed databases. The processors involved in the embodiments provided in this application may be general-purpose processors, central processing units, graphics processing units, digital signal processors, programmable logic devices, quantum computing-based data processing logic devices, artificial intelligence (AI) processors, etc., and are not limited to these.
[0114] The technical features of the above embodiments can be combined in any way. For the sake of brevity, not all possible combinations of the technical features in the above embodiments are described. However, as long as there is no contradiction in the combination of these technical features, they should be considered to be within the scope of this application.
[0115] The embodiments described above are merely illustrative of several implementation methods of this application, and while the descriptions are specific and detailed, they should not be construed as limiting the scope of this patent application. It should be noted that those skilled in the art can make various modifications and improvements without departing from the concept of this application, and these all fall within the protection scope of this application. Therefore, the protection scope of this application should be determined by the appended claims.
Claims
1. A model data processing method, characterized in that, The method includes: Based on the teacher model and the student model, image feature extraction processing is performed on the input sample image set to obtain a first sample feature dataset corresponding to the teacher model and a second sample feature dataset corresponding to the student model. Identify the first feature distance relationship between image features in the first sample feature dataset and the second feature distance relationship between image features in the second sample feature dataset, respectively. Based on the first feature distance relationship and the second feature distance relationship, the sample distribution loss parameter of the relative distance of samples is determined; The student model is optimized based on the sample distribution loss parameters to obtain a target image recognition model, which is used to recognize and process the input image.
2. The method according to claim 1, characterized in that, The identification of the first feature distance relationship between image features in the first sample feature dataset includes: Construct a set of K nearest neighbor samples for each image feature data in the first sample feature dataset; Determine the feature similarity between each image feature data and the feature data of the corresponding K nearest neighbor sample set; Based on the feature similarity, a first feature distance relationship is determined corresponding to the image feature data; The first feature distance relationship between image features in the first sample feature dataset is obtained by summarizing the first feature distance relationships of all image feature data.
3. The method according to claim 2, characterized in that, Determining the feature similarity between each image feature data and the feature data in the corresponding K nearest neighbor sample set includes: Determine the feature vectors of the image feature data and the feature vectors of the sample feature data; Based on the dot product of the feature vectors of the image feature data and the feature vectors of the sample feature data, the feature similarity between each image feature data and the feature data in the corresponding K nearest neighbor sample set is determined.
4. The method according to claim 2, characterized in that, The step of determining the first feature distance relationship corresponding to the image feature data based on the feature similarity includes: The feature similarities in the K nearest neighbor sample set of the image feature data are summarized to obtain the total similarity data; Based on the ratio of each feature similarity to the sum of the similarity data, the first feature distance relationship corresponding to the image feature data is determined.
5. The method according to claim 2, characterized in that, The process of constructing the K nearest neighbor sample set for each image feature data in the first sample feature dataset includes: Obtain the sample distribution data of the first sample feature dataset; Determine the K-nearest neighbor value based on the sample distribution data; Based on the K-nearest neighbor values, construct a set of K-nearest neighbor samples for each image feature data in the first sample feature dataset.
6. The method according to claim 2, characterized in that, The sample distribution loss parameters for determining the relative distance of samples based on the first feature distance relationship and the second feature distance relationship include: The sample distribution loss parameter of the relative distance of samples is obtained by iterating through the first feature distance and the second feature distance using the cross-entropy loss function.
7. The method according to any one of claims 1 to 6, characterized in that, The teacher and student models include classification models; The optimization of the student model based on the sample distribution loss parameter to obtain the target image recognition model includes: The sample classification result of the student model is determined based on the second sample feature dataset; The classification loss parameters of the student model are determined based on the sample classification results; Based on the sample distribution loss parameter and the classification loss parameter, the knowledge distillation loss parameter is determined; The student model is optimized based on the knowledge distillation loss parameters to obtain the target image recognition model.
8. The method according to claim 1, characterized in that, The target image recognition model includes an in-vehicle image recognition model; The method further includes: Acquire in-vehicle images captured by the vehicle's camera; The vehicle image is input into the vehicle image recognition model to obtain the image recognition result output by the vehicle image recognition model; Based on the image recognition results, the obstacle category and location in the vehicle image are determined; Path planning is performed based on the obstacle category and the obstacle location to generate autonomous driving instructions.
9. The method according to claim 1, characterized in that, The target image recognition model includes an image review model; The method further includes: Obtain the image to be reviewed; The image to be reviewed is input into the image review model, and the image review result output by the image review model is obtained; Construct model training data containing the image to be reviewed and the review result; The image review model is optimized based on the training data of the model.
10. A model data processing device, characterized in that, The device includes: The feature extraction module is used to perform image feature extraction processing on the input sample image set based on the teacher model and the student model respectively, to obtain a first sample feature dataset corresponding to the teacher model and a second sample feature dataset corresponding to the student model; The distance recognition module is used to identify the first feature distance relationship between image features in the first sample feature dataset and the second feature distance relationship between image features in the second sample feature dataset, respectively. The loss calculation module is used to determine the sample distribution loss parameters of the relative distance between samples based on the first feature distance relationship and the second feature distance relationship; The model optimization module is used to optimize the student model based on the sample distribution loss parameters to obtain the target image recognition model, which is used to perform recognition processing on the input image.
11. A computer device comprising a memory and a processor, wherein the memory stores a computer program, characterized in that, When the processor executes the computer program, it implements the steps of the method according to any one of claims 1 to 9.
12. A computer-readable storage medium having a computer program stored thereon, characterized in that, When the computer program is executed by a processor, it implements the steps of the method according to any one of claims 1 to 9.
13. A computer program product, comprising a computer program, characterized in that, When the computer program is executed by a processor, it implements the steps of the method according to any one of claims 1 to 9.