A food image hierarchical construction and classification method based on error analysis

By combining a flat classifier and a hierarchical classifier, and using a third classifier for selective hierarchical classification, the problem of unclear hierarchical structure is solved, the classification accuracy is improved, and it is applicable to various flat classification scenarios.

CN115346069BActive Publication Date: 2026-06-16ZHEJIANG UNIV CITY COLLEGE

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
ZHEJIANG UNIV CITY COLLEGE
Filing Date
2022-08-18
Publication Date
2026-06-16

AI Technical Summary

Technical Problem

Existing hierarchical classification methods struggle to construct suitable hierarchical structures when the hierarchical relationships between classes are unclear, resulting in low accuracy in class discrimination and difficulty in improving the final classification effect, especially in large-scale classification tasks.

Method used

An error analysis-based approach is adopted, combining a flat classifier and a hierarchical classifier. A third classifier is used for selective hierarchical classification to construct a hierarchical structure. By leveraging the complementary advantages of the flat classifier and the hierarchical classifier, a selective hierarchical classifier is designed.

🎯Benefits of technology

It improves classification accuracy in large-scale classification tasks where the hierarchical relationship between classes is not obvious, breaks through the limitations of traditional hierarchical classification methods, enhances classification performance, and is applicable to various flat classification scenarios.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN115346069B_ABST
    Figure CN115346069B_ABST
Patent Text Reader

Abstract

The application relates to a food image hierarchical construction and classification method based on error analysis, which comprises the following steps: S1: obtaining a food image data set Food-M, and training an M-class flat classifier; and S2: according to the prediction error of the M-class flat classifier on a verification set, finding significant large-class features, and performing class merging; the application breaks through the limitation of the existing hierarchical classification method in a large-scale classification scene with an unobvious hierarchical structure, a large-class discriminator is supplemented with a flat classifier to serve as a hierarchical classifier, the complementary advantages of the hierarchical classifier and the flat classifier are fully utilized, and a selective hierarchical classification is realized by means of a third classifier.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of food image analysis technology, specifically to a method for constructing and classifying food image hierarchies based on error analysis. Background Technology

[0002] Food identification is fundamental to diet monitoring and management. Many subsequent tasks, including calorie calculation, nutritional analysis, and dietary health management, require accurate food identification. The variety of foods to be identified is vast, but this increase in variety leads to decreased accuracy, making it difficult to meet the needs of real-world applications. Constructing a hierarchical structure, replacing flat classification with hierarchical classification, first makes judgments on broad categories, then makes judgments within each broad category individually. This eliminates constraints between major categories and alleviates the problem of too many types of food.

[0003] Error analysis-based hierarchical classification methods do not require specific hierarchical relationships between classes, overcoming the limitations of existing hierarchical classification methods in terms of application scenarios. However, the unclear hierarchical relationships between classes present the following challenges to hierarchical classification: constructing a suitable hierarchical structure is difficult; the accuracy of class distinction is too low; and the final classification effect is difficult to improve compared to flat classification. To address these issues, this invention proposes a method for constructing and classifying food image hierarchical structures based on error analysis. Summary of the Invention

[0004] To address the shortcomings of existing technologies, the first objective of this invention is to provide a method for constructing and classifying food image hierarchies based on error analysis. This invention overcomes the limitations of existing hierarchical classification methods in large-scale classification scenarios where the hierarchical structure is not obvious. A classifier for large categories is supplemented by a flat classifier to act as a hierarchical classifier, fully utilizing the complementary advantages of the hierarchical classifier and the flat classifier, and employing a third classifier to achieve selective hierarchical classification.

[0005] To solve the above-mentioned technical problems, the present invention is achieved through the following technical solution:

[0006] A method for constructing and classifying food image hierarchies based on error analysis includes the following steps:

[0007] S1: Obtain the food image dataset Food-M, divide it into training set, validation set and test set, and train an M-class flat classifier;

[0008] S2: Based on the prediction error of the M-class flat classifier on the validation set, find significant major class features and merge the categories to obtain the hierarchical structure corresponding to C major classes;

[0009] S3: Train a C-class classifier for the C major classes obtained by merging, and further fuse the discrimination of the M-class flat classifier and the C-class classifier to obtain an M-class hierarchical classifier and its output discrimination;

[0010] S4: Based on the predictions of the M-class flat classifier on the validation set, find the set of most easily confused classes for each predicted class;

[0011] S5: Design a third classifier to make a classification decision between the predicted class of the flat classifier and the set of most easily confused classes of that class, and realize the determination of the prediction of the flat classifier, including choosing the prediction of the flat classifier or the hierarchical classifier as the final class determination.

[0012] S6. Combine the flat classifier, the major classifier, and the third classifier as a whole to form a selective hierarchical classifier.

[0013] Furthermore, in step S1, the dataset Food-M is divided into training set, validation set and test set in a ratio of 6:1:3, without considering the hierarchical relationship between classes, and an M-class flat classifier is trained.

[0014] Furthermore, step S2 specifically includes:

[0015] Inference is performed on the validation set using an M-class flat classifier. Based on all samples that are predicted incorrectly in the validation set, the confusion elements between the true label and the predicted class are analyzed pairwise. Significant distinguishable features between the two classes are found, and other classes with the same feature are searched for for class merging.

[0016] For categories that cannot be merged, they are treated as separate major categories, resulting in the final hierarchical structure corresponding to C major categories.

[0017] Furthermore, step S3 specifically includes:

[0018] Based on the hierarchical structure corresponding to the C major categories obtained in step S2, train a C major category classifier for the C major categories;

[0019] Combining the top-5 predicted categories of the M-class flat classifier and the major class predicted categories of the C-class major classifier, the category that belongs to the major predicted category and ranks first among the top-5 predicted categories is selected as the output discrimination of the M-class hierarchical classifier;

[0020] If no category meets the criteria, the top-1 predicted category of the flat classifier is selected and used as the output of the M-class hierarchical classifier.

[0021] Furthermore, step S4 specifically includes:

[0022] Based on the predictions of the M-class flat classifier on the validation set, in the case of incorrect prediction, the true label corresponding to each predicted class is added to the set of most confusing classes for that predicted class; in the case of correct prediction, the classes at the top-2 positions are added to the set of most confusing classes for that predicted class.

[0023] Furthermore, in step S5:

[0024] The output category of the third classifier is the predicted category of M possible flat classifiers and the corresponding M sets of easily confused categories, where each set of easily confused categories contains T. i There are 10 categories, for T i Each category is assigned a separate output bit, resulting in a third classifier with output category (T+M), where T... i Much smaller than M, T much larger than M, and

[0025] During inference, the third classifier, based on the predicted category of the flat classifier, only considers the output position of the predicted category and the single set of easily confused categories corresponding to that predicted category, for a total of (T) i +1) output bits;

[0026] If the category predicted by the third classifier is the same as the category predicted by the flat classifier, then the prediction of the flat classifier is selected as the final category determination; if the category predicted by the third classifier is in the set of easily confused categories, then the prediction of the hierarchical classifier is selected as the final category determination.

[0027] A second objective of this invention is to provide an electronic device comprising:

[0028] One or more processors;

[0029] Storage device for storing one or more programs.

[0030] When the one or more programs are executed by the one or more processors, the one or more processors perform the method as described in the first objective above.

[0031] A third objective of the present invention is to provide a computer-readable medium having a computer program stored thereon, which, when executed by a processor, implements the method described in the first objective above.

[0032] Compared with the prior art, the present invention has the following advantages and beneficial effects:

[0033] This invention combines a manually constructed hierarchical structure, hierarchical classifier settings, the construction of the most easily confused category set, and a third classifier for secondary discrimination. The secondary discrimination by the third classifier optimizes the selective hierarchical classification effect and minimizes the cost of secondary discrimination for categories predicted by the flat classifier. This invention provides a solution to challenges such as unclear hierarchical relationships between categories, low accuracy of major categories, and the inapplicability of hierarchical classification, achieving a universal hierarchical construction and classification method. Specifically, a major category discriminator is supplemented by a flat classifier as a hierarchical classifier, fully utilizing the complementary advantages of hierarchical and flat classifiers, and selective hierarchical classification is achieved with the help of a third classifier. The hierarchical classifier, combining the discrimination of the flat classifier, does not completely depend on the accuracy of major category discrimination, and can improve classification performance compared to flat classification. Furthermore, the third classifier allows for a choice between the flat and hierarchical classifiers, fully leveraging their complementary advantages. Ultimately, the selective hierarchical classifier achieves further improvements over the hierarchical classifier. This invention has no requirements on the hierarchical structure between categories and is universally applicable in all flat classification scenarios. Attached Figure Description

[0034] Figure 1 This is a general framework diagram of an embodiment of the method of the present invention. In the diagram, circles and squares with the same fill texture correspond one-to-one. A single circle corresponds to a single output category of the flat classifier, while the corresponding square is the set of the most confusing categories for that output category, which may contain multiple categories.

[0035] Figure 2 This is a hierarchical structure diagram in an embodiment of the method of the present invention.

[0036] Figure 3 This is an example of Grad-CAM heatmap comparison between the C-class large class classifier and the M-class flat classifier on the Food-M dataset, as an embodiment of the method of the present invention.

[0037] Figure 4 This is a flowchart of an embodiment of the method of the present invention. Detailed Implementation

[0038] To enable those skilled in the art to better understand the technical solutions of the present invention, preferred embodiments of the present invention are described below in conjunction with specific examples. However, it should be understood that the accompanying drawings are for illustrative purposes only and should not be construed as limiting the present invention. For better illustration of this embodiment, some components in the drawings may be omitted, enlarged, or reduced, and do not represent the actual dimensions of the product. It is understandable that some well-known structures and their descriptions may be omitted in the drawings for those skilled in the art. The positional relationships described in the drawings are for illustrative purposes only and should not be construed as limiting the present invention.

[0039] The present invention will be further described below with reference to the accompanying drawings and embodiments, but this should not be construed as limiting the present invention.

[0040] like Figures 1 to 4 As shown, a method for constructing and classifying food image hierarchies based on error analysis is implemented by combining flat classification and hierarchical classification. This method includes training a flat classifier, manually constructing a hierarchical structure, training a major classifier, setting a hierarchical classifier, constructing a set of most easily confused categories, training a third classifier, and setting a selective hierarchical classifier; see also [link to relevant documentation]. Figure 1 It is a simple and universal hierarchical structure construction and hierarchical classification model.

[0041] In one illustrated embodiment, the method includes the following steps:

[0042] The first step is to obtain the food image dataset Food-M and train an M-class flat classifier.

[0043] In one example, the Food-M food image dataset contains M food categories, and all data are divided into training, test, and validation sets in a 6:3:1 ratio. The validation set is used to select the best model and prevent overfitting on the training set. The selection of the class merging method based on the accuracy of the major categories is only performed on the validation set; the confusion on the test set cannot be used as prior knowledge. When training the model, a pre-trained ResNet101 model on ImageNet is used as the initialization model. The trained M-class flat classifier is used as the major category discriminator to explore the best merging method. Furthermore, this M-class flat classifier serves as the benchmark for comparison with the final hierarchical classification method.

[0044] The second step is to identify significant major class features based on the prediction errors of the M-class flat classifier on the validation set and then merge the classes.

[0045] In one example, an M-class flat classifier is used to perform inference on the validation set. Based on all the samples that are predicted incorrectly in the validation set, the confounding elements between the true label and the predicted class are analyzed pairwise. Significant distinguishable features between the two classes (a certain raw material or cooking method) are found, and other classes with the same feature are searched for for class merging. For most classes that cannot be merged, they are treated as separate large classes to obtain the final hierarchical structure.

[0046] The selection of easily distinguishable features, the selection of major category features, and the merging of categories rely on subjective human judgment, thus the final hierarchical structure is subjective. However, this does not affect the effectiveness of the final selective hierarchical classifier, demonstrating the high accuracy of the hierarchical classifier that does not depend on the major category discriminator, and the robustness of the selective hierarchical classification method.

[0047] The third step is to train a C-class classifier for the merged C major classes, and further fuse the discrimination of the M-class flat classifier and the C-class classifier to obtain a discrimination different from the M-class flat classifier, which is regarded as the M-class hierarchical classifier.

[0048] In one example, following the hierarchical structure obtained in the previous step, a C-class classifier is trained for the C major classes. Combining the top-5 predicted categories of the M-class flat classifier and the major class predicted categories of the C-class classifier, the category that belongs to the predicted major class and is the first among the top-5 predicted categories is selected. If there is no category that meets the condition, the top-1 predicted category of the flat classifier is selected and regarded as the output discrimination of the M-class hierarchical classifier.

[0049] When none of the categories predicted by the class discriminator are among the top-5 predicted categories of the flat classifier, the class discriminator's judgment is ignored. This is because the probability of the true label appearing in a category outside the top-5 is very low, while the probability of it appearing in the top 1-5 decreases progressively. Choosing the top-1 predicted category as the hierarchical classifier is optimal. In this case, the hierarchical classifier and the flat classifier predict the same. However, by using the class discriminator to select the category that belongs to the predicted class and is the first among the top-5 predicted categories, a different discriminative prediction can be obtained than that of the flat classifier. Specifically, when the category at the top-1 position does not belong to the predicted class, but there are categories at the top 2-5 positions that do belong to the predicted class, the predictions of the hierarchical classifier and the flat classifier will differ.

[0050] The fourth step is to find the set of most easily confused categories for each predicted category based on the predictions of the M-class flat classifier on the validation set.

[0051] In one example, based on the predictions of the M-class flat classifier on the validation set, in the case of misprediction, the true label corresponding to each predicted class is added to the set of most confusing classes for that predicted class; in the case of correct prediction, the top-2 classes are added to the set of most confusing classes for that predicted class. Adding the top-2 classes to the set of most confusing classes is to cover as many classes as possible, so that finding the set of most confusing classes for each predicted class on the validation set can generalize to the test set as much as possible. Only the top-2 are considered, without adding further classes, because even if more classes are added, the performance of the selective hierarchical classifier cannot be improved.

[0052] The fifth step is to design a third classifier to make a classification decision between the predicted class of the flat classifier and the set of most easily confused classes of that class, thereby determining whether to believe the prediction of the flat classifier or the prediction of the hierarchical classifier.

[0053] In one example, a third classifier is designed based on the set of most confusing categories for each category obtained in step four. This classifier is used to make a classification decision between the predicted category of the flat classifier and the set of most confusing categories for that category, thus determining whether to believe the prediction of the flat classifier or the prediction of the hierarchical classifier.

[0054] The third classifier indirectly acts as a binary classifier, choosing between a flat classifier and a hierarchical classifier by selecting between the flat classifier's prediction and the set of most easily confused categories. The flat classifier's prediction represents the flat classifier itself, while the set of most easily confused categories represents all other possible true labels. Replacing the flat classifier with a hierarchical classifier, which makes a different prediction, is equivalent to replacing it with a hierarchical classifier. Of course, even if the third classifier chooses the hierarchical classifier as the final category prediction, for a single sample, the hierarchical classifier's prediction may still be the same as the flat classifier's prediction.

[0055] The third classifier aims to classify the predictions of M possible flat classifiers among their corresponding most confusing class sets, which are the M mini-classifiers. We train a single classifier to act as these M mini-classifiers to save resources and simplify the selective hierarchical classification method. Since a single classifier may appear in one or more mini-classifiers, the label corresponding to the class is singular and definite within a single classifier; however, for the third classifier, it is necessary to first select the mini-classifier corresponding to the class before determining the label for that class, meaning a single class may correspond to multiple labels. During training, to balance the training of the M mini-classifiers, the selection of all mini-classifiers for a single class is set to random. During inference, the selection of mini-classifiers depends on the predictions of the flat classifiers.

[0056] The sixth step is to combine the flat classifier, major classifier, and third classifier mentioned above as a whole to form a selective hierarchical classifier.

[0057] Test model performance: such as Figure 3 The image shown is a representative example from the test in this embodiment. The first row contains three examples of the original image, namely steamed fish, etc. Figure 3 -(1), Tremella and Red Date Soup, such as Figure 3 -(2), Fried shrimp, such as Figure 3 -(3); Figure 3-(4), 3-(5), and 3-(6) are Grad-Cam heatmaps on the C-class classifier corresponding to the three original images; 3-(7), 3-(8), and 3-(9) are Grad-Cam heatmaps on the M-class flat classifier corresponding to the three original images; all three images were classified incorrectly on the original M-class flat classifier, but made the correct class classification according to the C-class classifier, and then made a hierarchical classifier classification different from the flat classifier, and the third classifier chose the hierarchical classifier's classification as the final class determination, which is the success of the selective hierarchical classifier; the Grad-Cam heatmap visualizes the image region that the model focuses on when making the corresponding predicted class, with the red part being the image region of focus and the blue part being the opposite. As can be seen from the Grad-Cam heatmap, when making corresponding category judgments, the C-class discriminator focused on the fish tail, red date, and shrimp tail regions, thus making correct category predictions. Conversely, the M-class flat classifier focused on the surrounding areas, resulting in incorrect category predictions: steamed fish was predicted as roasted eggplant, white fungus and red date soup as porridge, and fried shrimp as steamed chicken with chili sauce. Clearly, the C-class discriminator, after category merging, is better able to capture major category features than the M-class flat classifier. In scenarios where the flat classifier's prediction is rejected by the third classifier, it can be used to replace the flat classifier, thereby improving the overall classification accuracy of the selective hierarchical classifier.

[0058] This invention was tested on the VireoFood-172 public dataset. The proposed hierarchical classification method was proven to outperform flat classification in large-scale classification tasks where there is no clear hierarchical structure between classes, overcoming the inapplicability of traditional hierarchical structure construction and classification methods in classification tasks without a clear hierarchical structure between classes. On a benchmark of 87.55% accuracy for the flat classifier, the hierarchical classifier of this invention achieved an accuracy of 88.39%, while the selective hierarchical classifier achieved an accuracy of 89.18%, demonstrating the effectiveness of the selective hierarchical classification method.

[0059] From the above description of the embodiments, those skilled in the art will clearly understand that the facilities of the present invention can be implemented using software plus necessary general-purpose hardware platforms. Embodiments of the present invention can be implemented using existing processors, or by dedicated processors used for this or other purposes for suitable systems, or by hardwired systems. Embodiments of the present invention also include non-transitory computer-readable storage media, comprising machine-readable media for carrying or having machine-executable instructions or data structures stored thereon; such machine-readable media can be any available medium accessible by a general-purpose or special-purpose computer or other machine with a processor. For example, such machine-readable media can include RAM, ROM, EPROM, EEPROM, CD-ROM or other optical disc storage, disk storage or other magnetic storage devices, or any other medium that can be used to carry or store the required program code in the form of machine-executable instructions or data structures and is accessible by a general-purpose or special-purpose computer or other machine with a processor. When information is transmitted or provided to a machine via a network or other communication connection (hardwired, wireless, or a combination of hardwired and wireless), that connection is also considered a machine-readable medium.

[0060] Based on the description and accompanying drawings of this invention, those skilled in the art can readily manufacture or use the biological sequence processing and model training method of this invention, and can produce the positive effects described in this invention.

[0061] The above description is merely a preferred embodiment of the present invention and is not intended to limit the present invention in any way. Any simple modifications or equivalent changes made to the above embodiments based on the technical essence of the present invention shall fall within the protection scope of the present invention.

Claims

1. A method for constructing and classifying food image hierarchies based on error analysis, characterized in that: Includes the following steps: S1: Obtain the food image dataset Food-M, divide it into training, validation, and test sets, and train the dataset. Flat classifier; S2: According to the above The class-flat classifier's prediction error on the validation set is analyzed by finding significant major class features and then merging the classes. The hierarchical structure corresponding to each major category; S3: Result of the merger Training by major category Large classifiers, and further fusion Flat classifier and The discrimination of the major classifier yields a result. Hierarchical classifiers and their output discriminants; S4: According to The predictions of the class-flat classifier on the validation set are used to find the set of most easily confused classes for each predicted class. S5: Design a third classifier to make a classification judgment between the predicted category of the flat classifier and the set of most easily confused categories of that category, thereby realizing the determination of the prediction of the flat classifier. The classification judgment includes selecting the prediction of the flat classifier or the hierarchical classifier as the final category determination. S6. Combine the flat classifier, the major classifier, and the third classifier as a whole to form a selective hierarchical classifier.

2. The method for constructing and classifying food image hierarchies based on error analysis according to claim 1, characterized in that: In step S1, the dataset Food-M is divided into training, validation, and test sets in a 6:1:3 ratio, without considering the hierarchical relationship between classes, and a training set is then trained. Flat classifier.

3. The method for constructing and classifying food image hierarchies based on error analysis according to claim 1, characterized in that, Step S2 specifically includes: use The class-flat classifier performs inference on the validation set. Based on all the samples that are predicted incorrectly in the validation set, it analyzes the confusion elements between the true label and the predicted class of the sample pairwise, finds significant distinguishable features between the two classes, and finds other classes with the same feature to merge the classes. For categories that cannot be merged, treat them as a separate major category to obtain the final result. The hierarchical structure corresponding to each major category.

4. The method for constructing and classifying food image hierarchies based on error analysis according to claim 1, characterized in that, Step S3 specifically includes: Based on step S2 The hierarchical structure corresponding to each major category is as follows: Training by major category Major category classifier; Combination Flat classifier -5 Predicted Categories and The major class predicted by the major classifier is selected from those that belong to the major class predicted and are located in the major class predicted. The category that ranks highest in the -5 prediction categories is used as The output discrimination of the class-level classifier; If no category meets the criteria, then the flat classifier is selected. -1 predicts the category, and treats it as... The output of the class-level classifier is discriminant.

5. The method for constructing and classifying food image hierarchies based on error analysis according to claim 1, characterized in that, Step S4 specifically includes: according to The class-flat classifier's predictions on the validation set are evaluated as follows: In the case of incorrect predictions, the true label corresponding to each predicted class is added to the set of most easily confused classes for that predicted class; in the case of correct predictions, the true label is added to the set of most easily confused classes for that predicted class. The category at position -2 is added to the set of most confusing categories for that predicted category.

6. The method for constructing and classifying food image hierarchies based on error analysis according to claim 1, characterized in that, In step S5: The output category of the third classifier is The predicted class of each possible flat classifier and its corresponding A set of easily confused categories, a single set of easily confused categories contains Each category is... Each category is assigned a separate output bit, resulting in the output category being ( The third classifier, where much smaller , Much larger ,and ; During inference, the third classifier, based on the predicted category of the flat classifier, only considers the output position of the predicted category and the single set of easily confused categories corresponding to that predicted category, for a total of ) output bits; If the category predicted by the third classifier is the same as the category predicted by the flat classifier, then the prediction of the flat classifier is selected as the final category determination; if the category predicted by the third classifier is in the set of easily confused categories, then the prediction of the hierarchical classifier is selected as the final category determination.

7. An electronic device, characterized in that: include: One or more processors; Storage device for storing one or more programs. When the one or more programs are executed by the one or more processors, the one or more processors implement the method as described in any one of claims 1-6.

8. A computer-readable medium having a computer program stored thereon, characterized in that: When the program is executed by the processor, it implements the method as described in any one of claims 1-6.