A method and apparatus for cross-domain small sample object image classification for smart terminals

By constructing a parallel feature extraction architecture of Euclidean space and hyperbolic space using a dual Riemannian manifold processing module in a smart refrigerator/freezer, the problem of insufficient accuracy in food image recognition in existing technologies is solved, and high-precision classification of food images is achieved.

CN122116011BActive Publication Date: 2026-06-30QINGDAO GUOCHUANG INTELLIGENT HOME APPLIANCES RES INSTITU +2

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
QINGDAO GUOCHUANG INTELLIGENT HOME APPLIANCES RES INSTITU
Filing Date
2026-04-28
Publication Date
2026-06-30

AI Technical Summary

Technical Problem

Existing object image recognition technologies for smart refrigerators/freezers struggle to effectively model hierarchical and long-tailed semantic relationships in home settings. Their feature discrimination and cluster compactness are insufficient, resulting in low accuracy in identifying subcategories.

Method used

A parallel feature extraction architecture for Euclidean and hyperbolic spaces is constructed using a dual Riemannian manifold processing module. Through collaborative representation in both spaces, global linear features and hierarchical, fine-grained semantic features of food ingredients are extracted respectively. The features of Euclidean and hyperbolic spaces are then fused to enhance the feature representation capability of food ingredient images.

Benefits of technology

By using dual-space collaborative representation of the dual Riemannian manifold processing module, the linear and nonlinear, global and local features of food ingredients are fully covered, solving the problem of hierarchical semantic modeling in image recognition scenarios and improving the recognition accuracy of food ingredient images.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122116011B_ABST
    Figure CN122116011B_ABST
Patent Text Reader

Abstract

This application relates to the field of smart home appliance technology, and discloses a method and apparatus for cross-domain small-sample object image classification for smart terminals. The method includes: acquiring an image of food to be identified; processing the image to extract basic feature vectors; mapping the basic feature vectors using a dual Riemannian manifold processing module to obtain Euclidean space features and hyperbolic space features respectively; fusing the Euclidean space features and hyperbolic space features to obtain a comprehensive feature representation for food classification; and determining the food classification result based on the comprehensive feature representation. The dual Riemannian manifold processing module includes Euclidean space branches and hyperbolic space branches. This method achieves multi-space feature fusion by collaboratively extracting Euclidean space features and hyperbolic space features using dual Riemannian manifolds, thereby improving the feature representation capability and classification accuracy of food images.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the field of smart home appliance technology, for example to a method and apparatus for cross-domain small sample object image classification for smart terminals. Background Technology

[0002] With the rapid development of IoT and smart home technologies, smart refrigerators / freezers, as core smart appliances in the kitchen, have seen their object recognition and management functions become a key technological direction for improving user experience. Object image recognition technology for smart refrigerators / freezers is mostly based on traditional convolutional neural network models. However, directly applying these image recognition solutions to the actual deployment of smart refrigerators / freezers in home scenarios presents insurmountable technical drawbacks: existing recognition models are mostly based on feature learning in Euclidean space, but the uniformity of Euclidean space cannot effectively model hierarchical, long-tailed semantic relationships, resulting in insufficient feature discrimination and cluster compactness, and low accuracy in subdivided category recognition.

[0003] The related technology discloses an image set classification system and method based on manifold deep learning and extreme learning machine, including a manifold layer, a transformation layer, an orthogonal layer, a projection layer, a pooling layer, an ELM layer, and an output layer. The method first uses a manifold layer to represent multiple view subsets of the same target object in the input image as a point in a Grassmann manifold. Secondly, the transformation layer transforms the orthogonal matrix in the Grassmann manifold into a low-dimensional matrix through a linear mapping. Thirdly, the orthogonal layer forms a Grassmann manifold from the low-dimensional matrix. Fourthly, the projection layer maps the Grassmann manifold to Euclidean space. Then, the pooling layer fuses data from different training branches, reducing the complexity of data feature mapping and controlling overfitting during training. Finally, the ELM layer is used for training, and the training result is output.

[0004] In the process of implementing the embodiments of this disclosure, at least the following problems were found in the related art:

[0005] Although related technologies attempt to break through the limitations of a single Euclidean space by using Grassman manifolds, their feature extraction schemes, such as single manifold representation, projection back to Euclidean space, linear assumption, single metric, static mapping, and simple fusion, still cannot solve the problem of hierarchical semantic modeling in image recognition scenarios.

[0006] It should be noted that the information disclosed in the background section above is only used to enhance the understanding of the background of this application, and therefore may include information that does not constitute prior art known to those skilled in the art. Summary of the Invention

[0007] To provide a basic understanding of some aspects of the disclosed embodiments, a brief summary is given below. This summary is not intended as a general commentary, nor is it intended to identify key / important components or describe the scope of protection of these embodiments, but rather as a prelude to the detailed description that follows.

[0008] This disclosure provides a method and apparatus for cross-domain small sample object image classification for smart terminals, which achieves multi-spatial feature fusion by extracting Euclidean space features and hyperbolic space features through dual Riemannian manifold collaborative extraction, thereby improving the feature representation capability and classification accuracy of food images.

[0009] In some embodiments, the method includes: acquiring an image of a food item to be identified; processing the image to extract a basic feature vector; mapping the basic feature vector using a bi-Riemannian manifold processing module to obtain Euclidean space features and hyperbolic space features, respectively; fusing the Euclidean space features and hyperbolic space features to obtain a comprehensive feature representation for object classification; and determining the food item classification result based on the comprehensive feature representation. The bi-Riemannian manifold processing module includes an Euclidean space branch and a hyperbolic space branch. The bi-Riemannian manifold processing module is trained as follows: acquiring a cross-domain food item image sample set, which includes source domain food items and target domain food items; based on the cross-domain food item image sample set, constructing corresponding class prototype centers in the Euclidean space of the Euclidean space branch and the hyperbolic space of the hyperbolic space branch, respectively, and updating the prototype centers of each class; calculating the gradient and backpropagating the corresponding prototype contrast loss based on the updated prototype centers of each class and the Euclidean space and hyperbolic space, optimizing the network parameters of the bi-Riemannian manifold processing module; and iteratively optimizing the bi-Riemannian manifold processing module until the loss converges, obtaining the trained bi-Riemannian manifold processing module.

[0010] In some embodiments, the apparatus includes a processor and a memory storing program instructions, the processor being configured to, when executing the program instructions, perform the aforementioned cross-domain few-sample object image classification method for a smart terminal.

[0011] The method, apparatus, and smart terminal for cross-domain small sample object image classification for smart terminals provided in this disclosure can achieve the following technical effects:

[0012] A parallel feature extraction architecture in Euclidean and hyperbolic spaces is constructed using a dual Riemannian manifold processing module. This dual-space collaborative representation compensates for the limitations of a single Euclidean space. The Euclidean space branch extracts global linear features of the ingredients, suitable for classifying broad ingredient categories. The hyperbolic space branch, leveraging the nonlinear metric properties of the Riemannian manifold, extracts hierarchical and fine-grained semantic features of the ingredients, suitable for fine-grained ingredient classification. After feature fusion, the architecture comprehensively covers both linear and nonlinear, global and local features of the ingredients. This solves the problem of inability to perform hierarchical semantic modeling in image recognition scenarios and overcomes the limitations of single Euclidean space feature extraction.

[0013] The above general description and the description below are exemplary and illustrative only and are not intended to limit this application. Attached Figure Description

[0014] One or more embodiments are illustrated by way of example with reference to the accompanying drawings. These illustrations and drawings do not constitute a limitation on the embodiments. Elements having the same reference numerals in the drawings are shown as similar elements. The drawings are not to be scaled. And wherein:

[0015] Figure 1 This is a schematic diagram of a cross-domain few-sample object image classification method for smart terminals provided in an embodiment of this disclosure;

[0016] Figure 2 This is a schematic diagram of the trained dual Riemannian manifold processing module in the method provided in the embodiments of this disclosure;

[0017] Figure 3 This is a schematic diagram illustrating the construction of class prototype centers corresponding to Euclidean space and hyperbolic space in the method provided in this embodiment of the disclosure;

[0018] Figure 4 This is a schematic diagram illustrating the optimization of network parameters of the dual Riemannian manifold processing module in the method provided in this embodiment of the disclosure;

[0019] Figure 5 This is a schematic diagram of a method for obtaining a lightweight student network model provided in the embodiments of this disclosure;

[0020] Figure 6 This is a schematic diagram of a method for fine-tuning an initialized joint training architecture, as provided in the embodiments of this disclosure.

[0021] Figure 7 This is a schematic diagram illustrating the construction of distillation loss in the method provided in the embodiments of this disclosure;

[0022] Figure 8 This is a schematic diagram illustrating the progressive self-training of the fine-tuned joint training architecture using a target domain sample set in the method provided in this embodiment of the disclosure.

[0023] Figure 9 This is a schematic diagram of another cross-domain few-sample object image classification method for smart terminals provided in this embodiment of the disclosure;

[0024] Figure 10 This is a schematic diagram of a cross-domain small sample object image classification device for a smart terminal provided in an embodiment of this disclosure;

[0025] Figure 11 This is a schematic diagram of a smart terminal provided in an embodiment of this disclosure. Detailed Implementation

[0026] To provide a more detailed understanding of the features and technical content of the embodiments of this disclosure, the implementation of the embodiments of this disclosure will be described in detail below with reference to the accompanying drawings. The accompanying drawings are for illustrative purposes only and are not intended to limit the embodiments of this disclosure. In the following technical description, for ease of explanation, several details are used to provide a full understanding of the disclosed embodiments. However, one or more embodiments may still be implemented without these details. In other cases, well-known structures and devices may be simplified in their depiction to simplify the drawings.

[0027] The terms "first," "second," etc., used in the specification, claims, and accompanying drawings of this disclosure are used to distinguish similar objects and are not necessarily used to describe a specific order or sequence. It should be understood that such data can be interchanged where appropriate for the embodiments of this disclosure described herein. Furthermore, the terms "comprising" and "having," and any variations thereof, are intended to cover non-exclusive inclusion.

[0028] Unless otherwise stated, the term "multiple" means two or more.

[0029] In this embodiment of the disclosure, the character " / " indicates that the objects before and after it are in an "or" relationship. For example, A / B means: A or B.

[0030] The term "and / or" describes an association between objects, indicating that three relationships can exist. For example, A and / or B means: A or B, or A and B.

[0031] The term "correspondence" can refer to an association or binding relationship. The correspondence between A and B means that there is an association or binding relationship between A and B.

[0032] In this embodiment, a smart terminal refers to a home appliance product formed by introducing microprocessors, sensor technology, and network communication technology into home appliances. It possesses characteristics of intelligent control, intelligent sensing, and intelligent applications. The operation of a smart terminal often relies on the application and processing of modern technologies such as the Internet of Things (IoT), the Internet, and electronic chips. For example, smart home appliances can be connected to electronic devices to enable users to remotely control and manage the smart terminal. Specifically, the smart terminal has a built-in edge processor, image acquisition device, and memory. A lightweight student network model is compiled by a neural network compiler and burned into the memory. When the smart terminal triggers the image acquisition device to acquire images of objects within the smart terminal, the lightweight student network model is called to perform forward inference to identify the object category and thus manage the object. Smart terminals include, but are not limited to, smart refrigerators, smart freezers, smart preservation cabinets, and other smart home appliances. Objects include food ingredients, beverages, and other non-food ingredients such as daily necessities and pharmaceuticals. Optionally, objects mainly include food ingredients and beverages.

[0033] Combination Figure 1 As shown, this disclosure provides a cross-domain few-sample object image classification method for smart terminals, including:

[0034] S101, the processor acquires the image of the food to be identified and processes the image to extract the basic feature vector.

[0035] S102, the processor uses the dual Riemannian manifold processing module to map the basic feature vectors to obtain Euclidean space features and hyperbolic space features respectively; wherein, the dual Riemannian manifold processing module includes Euclidean space branch and hyperbolic space branch.

[0036] S103, the processor integrates Euclidean space features and hyperbolic space features to obtain a comprehensive feature representation for food classification.

[0037] S104, the processor determines the food classification result based on the comprehensive feature representation.

[0038] Here, the image of the object to be identified is acquired through the built-in image acquisition device of the smart terminal. This image is a real-world image of the object, i.e., a food image, within the smart terminal's actual usage scenario, and is adaptable to complex lighting conditions, shooting angles, and other environmental features within the smart terminal. A feature extraction module processes the food image to be identified, extracting a basic feature vector based on spatial-spectral fusion. This feature extraction module can be a deep convolutional neural network. The food image undergoes convolution operations, feature purification, and global average pooling operations within the deep convolutional neural network, outputting a one-dimensional basic feature vector with uniform dimensions. While extracting basic visual features such as food texture and edges, the feature extraction module also eliminates interference from ambient lighting and variance in food position translation within the smart terminal.

[0039] The basic feature vectors are input into the dual Riemannian manifold processing module. The Euclidean space branch in this module, through regular convolution and fully connected layers, outputs flat spatial features (Euclidean space features) used to delineate broad categories such as solids / liquids. The hyperbolic space branch in the same module maps the features to a hyperbolic manifold via exponential mapping, outputting hyperbolic space features representing the hierarchical structure of the food ingredients. This dual Riemannian manifold processing module achieves collaborative representation of Euclidean and hyperbolic spaces. It utilizes Euclidean space to delineate the boundaries of major food categories while leveraging the negative curvature of hyperbolic space to embed low-distortion food tree-like hierarchical structures, maximizing the information value of limited samples and significantly improving fine-grained food identification capabilities in small-sample scenarios.

[0040] Euclidean and hyperbolic space features are concatenated and fused along the channel dimension to perform cross-dimensional semantic alignment, eliminating the distributional differences between the two spatial features (which include both Euclidean and hyperbolic space features) and obtaining a more comprehensive feature representation with richer dimensions. This comprehensive feature representation retains both the linear representational capability of a single Euclidean space and the nonlinear representational capability of hyperbolic space. Finally, the comprehensive feature representation is input into a classification layer, such as a prototype Softmax classifier based on Riemannian metrics, to output the predicted probability distribution of the categories. The category corresponding to the maximum probability is used as the food ingredient classification and recognition result, which is then fed back to the smart terminal for food ingredient management.

[0041] Thus, by using a dual Riemannian manifold architecture that combines Euclidean and hyperbolic spaces, the limitations of existing methods in extracting features from a single Euclidean space are completely overcome. The complementary and synergistic features of the two spaces significantly enhance the representational dimension and discriminative ability of food image features, effectively solving the core defect that a single Euclidean space cannot model nonlinear and hierarchical features.

[0042] Combination Figure 2 In step S102, the dual Riemannian manifold processing module is trained and obtained in the following way:

[0043] S2101, the processor acquires a cross-domain food image sample set, which includes source domain food images and target domain food images.

[0044] S2102, the processor constructs corresponding class prototype centers in Euclidean space in the Euclidean space branch and hyperbolic space in the hyperbolic space branch based on the cross-domain food image sample set, and updates the prototype centers of each class.

[0045] The S2103 processor calculates gradients and backpropagates them based on updated prototype centers and corresponding prototype contrast losses in Euclidean and hyperbolic spaces, optimizing the network parameters of the dual Riemannian manifold processing module.

[0046] S2104, the processor iteratively optimizes the dual Riemannian manifold processing module until the loss converges, and obtains the trained dual Riemannian manifold processing module.

[0047] Here, a cross-domain food image sample set is collected and constructed. The sample set includes a sample set of food images from the source domain and a sample set of food images from the target domain. The food images in the source domain sample set include a small number of labeled food images collected under a standardized laboratory environment, while the food images in the target domain sample set include a large number of unlabeled food images collected in real-world scenarios on smart terminals, which fits the actual application scenario of small cross-domain samples on smart terminals.

[0048] Based on a cross-domain food image sample set, corresponding class prototype centers are constructed for each food category in both the Euclidean space (within the Euclidean space branch) and the hyperbolic space (within the hyperbolic space branch). These class prototype centers represent the mean features of the corresponding food category. To avoid prototype center drift caused by small-sample gradient updates, an exponential moving average (EMA) is used to dynamically update the prototype centers for each category. The update formula for the prototype centers in the t-th iteration is:

[0049] .

[0050] in, This indicates the result after the t-th iteration. k The class prototype center for food-like ingredients, including the class prototype center in Euclidean space. P euc,k With the prototype center of hyperbolic space P hyp,k γ represents the momentum coefficient of the exponential moving average, which is a preset fixed constant used to control the update weight of the historical prototype center. B k Indicates that the current training batch belongs to the first... k A sample set of similar ingredients. In the t-th iteration, the first... i The dual-space features extracted from each sample by the dual Riemannian manifold processing module include Euclidean space features. f euc,i Hyperbolic space characteristics f hyp,i .

[0051] The network parameters are optimized using prototype contrastive loss. Based on the updated prototype centers, Euclidean space prototype contrastive loss and hyperbolic space prototype contrastive loss are calculated separately, and the two types of losses are summed to obtain the joint prototype contrastive loss. The network gradient is calculated based on the prototype contrastive loss, and the gradient is backpropagated to the dual Riemannian manifold processing module to iteratively optimize the linear transformation parameters of the Euclidean space branch, the Riemann exponent mapping parameters of the hyperbolic space branch, and the Möbius addition operation parameters.

[0052] Iterative optimization continues until the loss converges. This involves repeatedly executing the optimization process of forward inference of samples, prototype center update, prototype contrastive loss calculation, and gradient backpropagation until the prototype contrastive loss no longer decreases and tends to converge stably. At this point, the network parameters of the bi-Riemannian manifold processing module reach their optimal state, resulting in a trained bi-Riemannian manifold processing module, i.e., a stable bi-Riemannian manifold processing module.

[0053] This disclosure presents a cross-domain small-sample object image classification method for smart terminals. It employs a dual Riemannian manifold processing module to construct a parallel feature extraction architecture using Euclidean and hyperbolic spaces. This dual-space collaborative representation compensates for the limitations of a single Euclidean space. The Euclidean space branch extracts global linear features of the ingredients, suitable for classifying broad ingredient categories. The hyperbolic space branch, leveraging the nonlinear metric properties of the Riemannian manifold, extracts hierarchical and fine-grained semantic features of the ingredients, suitable for fine-grained ingredient classification. After feature fusion, the method comprehensively covers both linear and nonlinear, global and local features of the ingredients. This solves the problem of inability to perform hierarchical semantic modeling in image recognition scenarios and overcomes the limitations of single Euclidean space feature extraction.

[0054] Optionally, the processor may perform Z-score normalization preprocessing on the food image before processing it to extract the underlying feature vector.

[0055] Here, the food image is first parsed into a three-dimensional tensor. ,in C This refers to the number of color channels (in this example, it is RGB three channels). H and W Representing the adjusted unified spatial resolution, R is the set of real numbers constituting the image data. To eliminate the severe impact of complex internal environments of smart terminals, such as frost occlusion and alternating warm and cold LED light sources, on the activation values ​​of the underlying filters in the feature extraction module, channel-level Z-score normalization is performed on the image tensor:

[0056] .

[0057] in, These are the standardized pixel values. These are the original pixel values. Let be the mean of the m-th channel. Let be the standard deviation of the m-th channel. Then, the preprocessed batch tensor is input into the feature extraction module to extract the basic feature vector. It should be noted that this preprocessing is also performed on the sample images of the target and source domains during the training of the dual Riemannian manifold processing module.

[0058] Optionally, the processor uses a feature extraction module to process the food image to be identified, extracting the underlying spatial-spectral fusion feature vector, including:

[0059] The processor purifies features through multiple cascaded residual blocks of the feature extraction module to obtain a high-dimensional abstract semantic feature map.

[0060] The processor uses a global average pooling operator to integrate and average the high-dimensional abstract semantic feature map, and outputs the underlying one-dimensional space-spectral fusion basic feature vector.

[0061] Here, the feature extraction module is a deep convolutional residual network. Feature extraction is performed through multiple cascaded residual blocks at the bottom layer of the deep convolutional residual network: each residual block sequentially performs two-dimensional convolution, batch normalization, and non-linear activation operations, and so on. l The forward propagation expression for the layer residual block is: .in, H (l 1) For the first l-1 The input features of the layer For the first l The weight matrix of a two-dimensional convolution layer, where × denotes the convolution operation. It is a non-linear activation function. This represents batch normalization. Through layer-by-layer downsampling of residual blocks, a high-dimensional abstract semantic feature map is extracted. At the end of the backbone network of the deep convolutional residual network, a global average pooling operator is used to normalize the feature maps of dimension 1. B × d in × h × w The high-dimensional abstract semantic feature map is integrated and averaged along the height and width dimensions to eliminate the translation variance caused by the positional shift of the ingredients. The output dimension is... d in The underlying one-dimensional space-spectral fusion basic feature vector .in, B Batch Size is the number of food images that are input into the model at one time. For example, it can be set to 32. h , w These represent the height and width of the feature map, respectively.

[0062] Optionally, the hyperbolic space branch adopts a Poincaré sphere model with negative curvature; the curvature is constant.

[0063] Here, a constant curvature Poincaré sphere model with negative curvature is used to stably construct the geometric metric rules of hyperbolic space, ensuring the consistency and robustness of hyperbolic space feature mapping. The geometric characteristics of negative curvature can accurately adapt to the hierarchical and tree-like semantic distribution patterns of food categories, strengthening the modeling ability of hyperbolic space for hierarchical semantic features. This compensates for the technical deficiency that a single Euclidean space cannot express hierarchical subordinate relationships. Specifically, the basic feature vectors are projected onto the hyperbolic Riemannian manifold space represented by the negative curvature Poincaré sphere model through Riemann exponent mapping. Based on the geometric constraints of negative curvature, stable hyperbolic space features are generated through Riemann exponent mapping and Möbius summation, and the geodesic distance between hyperbolic features is calculated based on this negative curvature to represent the spatial distribution pattern. By calculating the feature distance between samples through hyperbolic geodesic distance, fine-grained, long-tailed food items at the edge of the hierarchical structure obtain a larger separability gap, accurately representing the fine-grained subordinate relationships of food items. Among them, the Poincaré sphere model with negative curvature can avoid feature drift caused by fluctuations in the geometric parameters of hyperbolic space, ensuring the stability and consistency of semantic feature extraction at the food level, and adapting to the accurate identification needs of food subcategories in smart terminal scenarios.

[0064] Optionally, in step S102, the processor uses a dual Riemannian manifold processing module to map the basic feature vectors to obtain hyperbolic space features; including:

[0065] The processor performs a linear transformation on the basic feature vectors to obtain the hyperbolic tangent space vectors.

[0066] The processor projects the tangent space vectors onto the Poincaré sphere model through the Riemann index mapping to obtain the initial hyperbolic space features.

[0067] The processor performs algebraic operations to correct the initial hyperbolic space features based on the Möbius method, thereby obtaining the final hyperbolic space features.

[0068] Here, the basic feature vectors are projected onto hyperbolic space using the Riemann index mapping to obtain hyperbolic space features. The feature transformation process, combining the Riemann index mapping of the Poincaré sphere model with the Möbius stripe of the hyperbolic manifold, ensures low-distortion embedding of the basic feature vectors in hyperbolic space and adheres to the metric axioms of hyperbolic geometry. This allows the hyperbolic space features to accurately represent the tree-like hierarchical structure of the ingredients.

[0069] Specifically, the hyperbolic space adopts a Poincaré sphere model with negative curvature of -c. Where c > 0, and c is a preset fixed value used to define the geometric curvature property of hyperbolic space; The dimension is dA Poincaré sphere model with curvature c is used, where R is the set of real numbers. This model is suitable for the exponential growth characteristics of a hierarchical tree structure of food ingredients. During projection, the basic feature vectors are linearly transformed to the Poincaré sphere model at the origin. x The Euclidean tangent space at 0=0 yields the transition eigenvectors, i.e., the tangent space vectors. v By introducing a learnable Riemannian index mapping layer, a conformal factor is used to map and calculate the transition feature vector, which is then projected onto the Poincaré sphere hyperbolic space to generate the initial hyperbolic space features. f hyp The Riemann Exponential Map algorithm is as follows:

[0070] .

[0071] Within the hyperbolic space of a Poincaré sphere, the initial hyperbolic space features are transformed using the Möbius method to obtain the final hyperbolic space features. Because hyperbolic space does not satisfy the commutative and associative laws of addition in Euclidean space, conventional linear addition cannot be directly applied to process the features; therefore, the Möbius method, specific to hyperbolic geometry, is used for feature transformation. This transformation is applied to any two eigenvectors in the Poincaré sphere model. The Möbius strip addition operation is as follows:

[0072] .

[0073] Among them, ⊕ c Indicated based on curvature constant c The Möbius addition operator is an algebraic operation rule specific to hyperbolic Riemannian manifolds. x, y The hyperbolic feature vector located within the Poincaré sphere model is represented as the initial hyperbolic space feature in this embodiment of the present disclosure.

[0074] Thus, by combining cross-space projection of the Riemann index mapping with manifold optimization using the Möbius method, the transformation of basic feature vectors into hyperbolic space features is achieved. This not only realizes an effective mapping from Euclidean space to hyperbolic space but also strictly adheres to the inherent properties of hyperbolic geometry. This ensures the effectiveness and rationality of the hyperbolic space features, and the final generated hyperbolic space features can be accurately and with low distortion embedded into the tree-like hierarchical relationships of the ingredients.

[0075] Optionally, in step S102, the processor uses a dual Riemannian manifold processing module to map the basic feature vectors to obtain Euclidean space features; including:

[0076] In the Euclidean space branch, the processor preserves or linearly transforms the fundamental eigenvectors to obtain Euclidean space features.

[0077] Here, in the Euclidean space branch, the basic feature vectors are preserved or linearly transformed. The linear transformation is a combination of convolution and fully connected layers, which can adjust the dimensions and enhance the semantics of the basic feature vectors according to the requirements of food feature representation. If the dimensions and representational capabilities of the basic feature vectors already meet the requirements for class differentiation in Euclidean space, the basic feature vectors can be directly retained as Euclidean space features. Euclidean space features possess translation invariance, clearly defining the global decision boundaries for food categories such as solid / liquid, meat / vegetables, and fresh / cooked food. Furthermore, within the Euclidean space branch, the feature distance between samples is calculated using the standard L2 norm, ensuring the separability of class features.

[0078] Combination Figure 3 Optionally, in step S2102, based on the cross-domain food image sample set, corresponding class prototype centers are constructed in the Euclidean space of the Euclidean space branch and the hyperbolic space of the hyperbolic space branch, respectively, including:

[0079] S2121, the processor initializes the prototype centers of each food category in Euclidean space and hyperbolic space.

[0080] S2122, the processor calculates the mean value of the corresponding category features based on the features of similar food samples in the current training batch.

[0081] S2123, the processor smoothly updates the corresponding class prototype center in Euclidean space based on the calculated mean and exponential moving average of the corresponding class features.

[0082] S2124, the processor uses Möbius summation and exponential moving average to update the class prototype center corresponding to the hyperbolic space.

[0083] Here, for all food categories, the class prototype centers of the Euclidean space branch and the hyperbolic space branch are initialized separately. Initial Euclidean space class prototype centers for each food category can be assigned using zero-vector initialization or random initialization. P euc,k and Hyperbolic Space Class Prototype Center P hyp,k Preferably, various prototype centers are initialized based on the source domain sample set to learn general and stable food category / subcategory features as initial prototype values.

[0084] Then, the class prototype is updated based on the target sample set. The Euclidean space features of each food sample in the current training batch are extracted and processed by the dual Riemannian manifold processing module. f euc,i Hyperbolic space characteristics f hyp,iSamples of similar ingredients are grouped according to their true labels. The arithmetic mean of the Euclidean space features and hyperbolic space features for each ingredient category within the current batch is calculated to obtain the corresponding category feature mean. Euclidean space supports standard numerical operations; therefore, standard numerical averaging and exponential moving average (EMA) are used to smoothly update the class prototype centers, suppressing prototype drift caused by small sample batches and ensuring update stability. Hyperbolic space is a negative curvature Riemannian manifold; the Möbius method and exponential moving average (EMA) are used to update the class prototype centers, strictly adhering to the geometric operation rules of the hyperbolic manifold to ensure the closure and accuracy of prototype updates. Through this dual-space differentiated prototype update method, the Euclidean space prototype relies on linear operations for smooth stability, while the hyperbolic space prototype relies on the Möbius method to adapt to the manifold geometry. The two work together to ensure the accuracy and reliability of the class prototype centers in the dual Riemannian manifold processing module, adapting to cross-domain small sample ingredient classification scenarios.

[0085] Optionally, in step S2121, the processor initializes the prototype centers of each food category in Euclidean space and hyperbolic space, including:

[0086] The processor initializes the prototype centers of each category in Euclidean space as zero vectors or small-scale random vectors.

[0087] The processor initializes the prototype centers of each category in hyperbolic space as vectors around the origin of the Poincaré sphere model, ensuring that the initial prototype centers are constrained within the hyperbolic manifold.

[0088] Here, a differentiated initialization strategy is adopted to address the differences in geometric properties between Euclidean and hyperbolic spaces. This ensures that the initialization of Euclidean space prototypes is concise and efficient, without interfering with subsequent smooth updates. Simultaneously, it ensures that the initial state of the hyperbolic space prototype conforms to the manifold constraints of the Poincaré sphere model, preventing computational failures caused by the initial prototype exceeding the hyperbolic manifold's range, thus providing a stable foundation for iterative updates of class prototype centers. Specifically, Euclidean space is a linear, flat geometric space that supports standard linear algebra operations and has no strict spatial range constraints on the initial vector. Therefore, the class prototype centers corresponding to various food items in Euclidean space are uniformly initialized to zero vectors or small-scale random vectors with extremely small value ranges. This initialization method is concise and efficient, quickly providing baseline values ​​for Euclidean space prototypes without interfering with the smooth update process based on prototypes, ensuring the stability of Euclidean space prototype iterations.

[0089] The hyperbolic space branch adopts a Poincaré sphere model with negative curvature. This manifold space has strict spatial constraints. For hyperbolic features and class prototype centers to be guaranteed to be within the Poincaré sphere model, operations such as the Möbius addition and Riemann exponent mapping must be performed within the closure and validity of these operations. Therefore, the class prototype centers corresponding to various food items in hyperbolic space are initialized as small-scale vectors around the origin of the Poincaré sphere model, ensuring that the initial prototype centers satisfy... The manifold constraint ensures that the initial prototype center is strictly confined within the effective space of the hyperbolic manifold, preventing geometric distortions in subsequent hyperbolic space prototype updates and feature metric calculations due to the initial prototype going out of bounds, thus ensuring the reliability of hyperbolic space hierarchical semantic modeling.

[0090] Optionally, in step S2123, the processor smoothly updates the corresponding class prototype center in Euclidean space based on the calculated mean and exponential moving average of the corresponding class features, including:

[0091] The processor defaults to the weighting coefficients of the exponential moving average.

[0092] The processor performs a weighted fusion of the historical prototype centers from the previous iteration and the feature mean of similar samples in the current training batch to update the prototype centers in Euclidean space for the current training batch.

[0093] Here, exponential moving averages are used to smoothly integrate the features of historical prototypes and the current batch. Combined with standard numerical averaging to stably calculate the mean of features for similar samples, this effectively suppresses prototype center drift in small-sample cross-domain scenarios and ensures the iterative stability of Euclidean space prototype centers. Specifically, a pre-set weight coefficient γ for the exponential moving average (EMA), which is a preset hyperparameter between 0 and 1, is used to control the weight ratio of historical prototype centers during the update process, balancing historical prototype information with current batch sample feature information, and avoiding drastic fluctuations in prototype centers caused by single batch sample volatility.

[0094] The mean of Euclidean space features for samples of the same food category within the current training batch is calculated using a standard numerical average. This means taking the arithmetic mean of the Euclidean space features of all samples belonging to the same food category in the current batch, thus obtaining the category feature center for the current batch. Then, the historical Euclidean space class prototype centers from the previous iteration are weighted and fused with the mean feature values ​​of similar samples in the current batch, and a smooth update is performed using an exponential moving average method to obtain the final Euclidean space class prototype centers for the current training batch. The Euclidean space prototype update formula is as follows:

[0095] .

[0096] in, This indicates the result after the t-th iteration. k The Euclidean space prototype center for food-like ingredients. In the t-th iteration, the first... i The Euclidean space features of each sample were extracted by the dual Riemannian manifold processing module. The weight of the historical prototype center is the momentum coefficient γ, and the weight of the current batch feature mean is (1) γ), through weighted fusion update, obtains the Euclidean space class prototype center of the current training batch, effectively suppressing prototype center drift in small sample scenarios. Thus, through weighted fusion update, the Euclidean space class prototype center maintains a smooth transition during iteration, avoiding significant shifts due to sample distribution deviations in small batches of the target domain.

[0097] Optionally, the weighting coefficient γ of the exponential moving average is 0.9. A larger γ indicates a very high proportion of old prototypes, minimal impact from new batches of samples, and extremely stable, stable prototype updates. A smaller γ indicates a low proportion of old prototypes, significant impact from new batches of samples, and rapid but volatile prototype updates. In this embodiment, γ is typically set to 0.9, allowing the prototype to update slowly with historically stable information, preventing it from being skewed by small batches of samples, thus balancing stability and update efficiency.

[0098] Optionally, in S2124, the processor uses the Möbius summation and exponential moving average to update the class prototype center corresponding to the hyperbolic space, combining the calculated mean of the corresponding class features, including:

[0099] The processor defaults to the weighting coefficients of the exponential moving average.

[0100] The processor uses the Möbius method to weight and fuse the historical prototype centers of the previous iteration with the feature mean of the same type of samples in the current batch, so as to update the prototype centers of the hyperbolic space of the current training batch.

[0101] Here, the Möbius method is adapted to the geometric operation rules of hyperbolic manifolds, and combined with exponential moving average to achieve smooth prototype updates. This ensures the operational closure of the hyperbolic space class prototype center update while effectively suppressing prototype drift in small sample cross-domain scenarios, thus improving the stability of hyperbolic space hierarchical semantic modeling. The weight coefficients of the preset exponential moving average are described above and will not be repeated here. The mean hyperbolic space features of similar food samples within the current training batch are calculated using standard numerical averaging to obtain the hyperbolic space class feature centers of the current batch. Then, based on the geometric operation rules of hyperbolic Riemannian manifolds, the Möbius method is used to weight and fuse the historical hyperbolic space class prototype centers obtained from the previous iteration with the mean hyperbolic space features of similar samples in the current batch, completing the exponential moving average smooth update. Finally, the hyperbolic space class prototype centers of the current training batch are obtained.

[0102] The formula for updating the prototype center of the hyperbolic space class is as follows: .

[0103] in, This indicates the result after the t-th iteration. k The hyperbolic space prototype center of food-like ingredients In the t-th iteration, the first... iHyperbolic space features extracted from each sample using the double Riemannian manifold processing module.

[0104] Combination Figure 4 Optionally, in step S2103, the processor calculates the gradient and backpropagates it based on the updated prototype centers and the prototype contrast loss corresponding to Euclidean space and hyperbolic space, optimizing the network parameters of the dual Riemannian manifold processing module, including:

[0105] S2131, the processor uses the updated class prototype center as the metric.

[0106] S2132, the processor uses the L2 norm in Euclidean space to calculate the spatial distance between the Euclidean space features of the sample and the corresponding class prototype center.

[0107] S2133, the processor calculates the spatial distance between the hyperbolic space features of the sample and the corresponding class prototype center using hyperbolic geodesic distance in hyperbolic space.

[0108] S2134, the processor constructs prototype contrast loss in Euclidean space and prototype contrast loss in hyperbolic space based on the spatial distance calculated in each space, forming a joint prototype contrast loss.

[0109] The S2135 processor minimizes the joint prototype contrast loss through backpropagation, optimizing the network parameters of the dual Riemannian manifold processing module.

[0110] Here, by using a dual-space differential distance metric and joint loss constraint, features of similar food items are forced to cluster towards their corresponding class prototype centers in both spaces. This improves the compactness of intra-class features and the discriminative power between classes, further enhancing the module's ability to model the semantics of food item hierarchies and adapting to cross-domain small-sample classification scenarios. Specifically, the Euclidean space class prototype centers and hyperbolic space class prototype centers updated based on exponential moving averages are used as benchmarks for measuring sample feature similarity, to assess the degree of matching between sample features and their respective class centers. For Euclidean space features, the L2 norm is used to calculate the spatial distance between the sample's Euclidean space features and the corresponding class prototype centers (i.e., the updated class prototype centers). The calculation formula is as follows:

[0111] .

[0112] For hyperbolic space features, the hyperbolic geodesic distance specific to hyperbolic space is used to calculate the spatial distance between the hyperbolic space features of the sample and the corresponding class prototype center, i.e., the updated class prototype center. The calculation formula is as follows:

[0113] .

[0114] Euclidean space prototype contrast loss is constructed based on Euclidean space distance, and hyperbolic space prototype contrast loss is constructed based on hyperbolic space distance. The two types of losses are directly summed to form a joint prototype contrast loss, which is the total prototype contrast loss. The formula for calculating the joint prototype contrast loss is:

[0115] .

[0116] in, L proto This represents the joint prototype contrast loss, used to constrain features of similar ingredients to cluster towards the corresponding prototype center. This represents the L2 norm distance in Euclidean space, used to calculate the distance between a feature in Euclidean space and the center of the corresponding class prototype. Let represent the Euclidean space feature of the i-th sample. This represents the Euclidean space class prototype center of the category to which the i-th sample belongs; Represents the hyperbolic geodesic distance in hyperbolic space, used to calculate the distance between hyperbolic space features and the corresponding class prototype center. Let represent the hyperbolic space feature of the i-th sample. The hyperbolic space class prototype center represents the category to which the i-th sample belongs. y i Let be the true class label of the i-th sample.

[0117] Using the joint prototype contrastive loss as the optimization objective, the gradient of the loss with respect to the network parameters of the bi-Riemannian manifold processing module is calculated, and this gradient is propagated backward along the network's forward propagation path. The network parameters of the bi-Riemannian manifold processing module are iteratively updated using a gradient descent algorithm, continuously minimizing the joint prototype contrastive loss until convergence. This means the distance between sample features and the corresponding class prototype centers continuously decreases, resulting in a stable bi-Riemannian manifold processing module.

[0118] Optionally, in step S2133, the processor calculates the spatial distance between the hyperbolic space features of the sample and the corresponding class prototype center using hyperbolic geodesic distance, including:

[0119] The processor uses the Möbius method to calculate the relative vector between the hyperbolic space features of the sample and the corresponding class prototype center.

[0120] The processor calculates the Euclidean norm of the relative vector and substitutes the Euclidean norm into the hyperbolic geodesic distance formula to calculate the spatial distance between the hyperbolic spatial features of the sample and the corresponding class prototype center.

[0121] Here, the Möbius strip method specific to hyperbolic space is used to calculate the relative vector between the hyperbolic space features of the sample and the corresponding class prototype center. Δf The calculation formula is as follows: For the relative vector Δf Calculate the standard Euclidean norm. Substituting the calculated Euclidean norm into the predefined hyperbolic geodesic distance formula, the spatial distance between the hyperbolic spatial features of the sample and the corresponding class prototype center is obtained: .

[0122] Furthermore, the spatial distance between the hyperbolic spatial features of the sample and the corresponding class prototype center can also be expressed as:

[0123] .

[0124] In this way, the distance between sample features and class prototype centers is calculated under the constraints of hyperbolic manifold, ensuring that the measurement method conforms to the hyperbolic geometric rules and improving the modeling accuracy of food hierarchical semantics in hyperbolic space.

[0125] Optionally, the processor iteratively optimizes the dual Riemannian manifold processing module until the loss converges, obtaining the trained dual Riemannian manifold processing module, including:

[0126] After each iteration, the processor calculates the total loss value of the Euclidean space prototype contrast loss and the hyperbolic space prototype contrast loss.

[0127] When the total loss value is continuously reduced to below a preset loss threshold after multiple iterations, the processor stops iterating and obtains the trained dual Riemannian manifold processing module.

[0128] Here, by continuously monitoring the convergence state of the total loss value over multiple rounds, it is possible to accurately determine whether the network parameters have reached the optimal level, avoiding model underfitting caused by stopping iteration too early or model overfitting caused by excessive iteration, and ensuring that the trained dual Riemannian manifold processing module has stable recognition performance in cross-domain small sample food classification scenarios.

[0129] In each iteration, after updating the prototype centers in both spatial domains and optimizing the network parameters via gradient descent, the Euclidean space prototype contrast loss and the hyperbolic space prototype contrast loss are summed to obtain the total loss value for the current iteration, i.e., the total prototype contrast loss mentioned earlier, to evaluate the current optimization level of the model. A loss threshold and the number of consecutive iterations required for convergence are pre-set; the total loss value obtained in each iteration is continuously monitored during training. When the total loss value is consistently lower than the preset loss threshold for multiple iterations, it indicates that the Euclidean space features and hyperbolic space features have sufficiently converged to the corresponding prototype centers, and the model has converged to a stable state. At this point, the iterative optimization process is stopped, resulting in a trained bi-Riemannian manifold processing module, i.e., a highly stable bi-Riemannian manifold processing module. Thus, through convergence determination, the optimal training stopping point of the model is accurately determined, ensuring that the feature extraction and classification performance of the bi-Riemannian manifold processing module reaches its best, meeting the needs of cross-domain small-sample food image classification for smart terminals.

[0130] Optionally, the feature extraction module, the bi-Riemannian manifold processing module, and the classification layer constitute a lightweight student network model. This lightweight student network model is obtained by jointly training with topology-preserving relational knowledge distillation and progressive self-training, followed by the removal of the teacher network. The bi-Riemannian manifold processing module is a stable bi-Riemannian manifold processing module.

[0131] Here, the lightweight student network model is achieved through a two-stage joint training process: topology-preserving knowledge distillation and progressive self-training. After training, the teacher network and related auxiliary modules are removed, and the final deployable lightweight model is obtained through quantization. Specifically, topology-preserving knowledge distillation is used to fine-tune the pre-trained and initialized teacher and student networks. The pre-training initialization of the teacher and student networks is based on the source domain sample set. After pre-training, the initial weights of the teacher network are frozen as the source domain knowledge carrier, and the student network is used as the parameter optimization object. In the fine-tuning stage, labeled samples from the target domain sample set are used to fine-tune the initialized teacher and student networks. Small-sample fine-tuning based on topology-preserving knowledge distillation forces the student network to retain the source domain feature topology of the teacher network, updating only the student network parameters while the teacher network does not participate in gradient updates. This preserves the robust feature topology of the pre-trained source domain, effectively avoiding catastrophic forgetting and overfitting during small-sample cross-domain fine-tuning. Even in scenarios with complex lighting and varying shooting distances within smart terminals, high recognition accuracy is maintained.

[0132] Then, the fine-tuned teacher and student networks are self-trained using the target domain sample set to further optimize the student network. Specifically, the teacher network predicts unlabeled samples in the target domain sample set to assign pseudo-labels; the student network parameters are updated using pseudo-labeled samples and labeled samples. Furthermore, the teacher network parameters are smoothly updated by the student network using an exponential moving average (EMA) to filter out noise gradients caused by erroneous pseudo-labels and improve the prediction accuracy of the teacher network. This process is iterated until the student network converges, resulting in a trained student network. Redundant teacher networks and training auxiliary modules in the inference stage of the trained student network are removed, retaining only the feature extraction module, the bi-Riemannian manifold processing module, and the classification layer. The retained network structure is post-trained and quantized, then compiled by a neural network compiler to obtain a lightweight student network model that can be deployed on the edge of smart terminals. Thus, through progressive self-training, without increasing computational costs or relying on manual annotation, the massive amounts of unlabeled data from smart terminals are efficiently and evenly utilized to achieve a progressive improvement in the model's target domain recognition accuracy and robustness. This allows the model to retain the robust topology of the source domain while being deeply adapted to the real-world, complex scenarios of smart terminals. Ultimately, the output is a student network model that meets edge deployment requirements and enables high-precision food identification.

[0133] Furthermore, the source domain sample set refers to a collection of food image samples with complete category labeling information, collected in a standardized laboratory under controlled conditions. The target domain sample set refers to a collection of food image samples collected in the physical environment inside a user's actual smart terminal. This is the sample set of the actual application scenarios that the model ultimately needs to adapt to. This sample set generally contains a very small subset of labeled samples and a large subset of unlabeled samples. For example, the very small subset of labeled samples is set as a 5-shot small sample, representing newly added food items that the user has just put into the smart terminal and that the smart terminal has never seen before, with only 5 images for each category.

[0134] Combination Figure 5 Optionally, the lightweight student network model is obtained through the following methods:

[0135] S201, the processor initializes the constructed joint training architecture, which includes teacher and student networks, using the source domain sample set.

[0136] S202, the processor uses topology-preserving relation knowledge distillation to fine-tune the initialized joint training architecture to convergence using labeled samples from the target domain sample set.

[0137] S203, the processor uses the target domain sample set to progressively self-train the fine-tuned joint training architecture.

[0138] S204: After the processor has been progressively self-trained to convergence, it removes the teacher network from the joint training architecture and retains only the forward propagation path of the student network to obtain a lightweight student network model.

[0139] Here, the lightweight student network model is based on a joint training architecture of teacher and student networks. After three stages of optimization—source domain sample set initialization, target domain small sample set fine-tuning (i.e., a small sample set consisting of labeled samples in the target domain), and progressive self-training of the target domain sample set—the training-dedicated teacher network and redundant training modules are removed, retaining only the simplified model adapted for deployment on the edge of smart terminals obtained from the forward propagation path of the student network. This process ensures that the model fully learns the general food knowledge of the source domain and deeply adapts to the real-world scenarios of smart terminals in the target domain, while also achieving model lightweighting, meeting the low computing power and low power consumption hardware requirements of smart terminal edge devices.

[0140] Specifically, a teacher network and a student network with consistent structure are constructed. Both include a feature extraction module, a bi-Riemannian manifold processing module, and a classification layer, forming a joint training architecture for the teacher and student networks. The pre-trained model, consisting of the feature extraction module, bi-Riemannian manifold processing module, and classification layer, is initialized and pre-trained using a source domain sample set. This source domain sample set consists of labeled food image samples collected in a standardized laboratory environment. Through pre-training on this sample set, a pre-trained model with general knowledge is obtained. The teacher network and student network are initialized by replicating the weight parameters of the pre-trained model to learn the general visual features of food, category boundary determination rules, and feature topology, forming a general food recognition knowledge system for the source domain. After initialization, the initial weights of the teacher network are frozen, serving as a fixed carrier of source domain knowledge, while the weight parameters of the student network are used as the optimization targets for subsequent training.

[0141] Based on a topology-preserving knowledge distillation strategy, the initial joint training architecture is fine-tuned with small samples using labeled samples from the target domain sample set until the model converges. The labeled samples in the target domain consist of a very small number of labeled food images from real-world smart terminal environments. During fine-tuning, a prototype Softmax classifier based on Riemannian metrics is constructed using the highly stable bi-Riemannian manifold prototype obtained earlier, to calculate the negative exponential distribution of the distance between bi-space features and class prototypes. To avoid catastrophic forgetting and overfitting during small-sample cross-domain fine-tuning, a topology-preserving knowledge distillation loss is calculated, including distance distillation loss to maintain feature scale consistency and angle distillation loss to maintain semantic direction consistency. This forces the student network to adapt to the target domain smart terminal environment while preserving the source domain feature topology in the teacher network. The student network parameters are updated through backpropagation, while the teacher network does not participate in gradient updates, completing the initial adaptation of the student network to the target domain smart terminal scenario.

[0142] By leveraging the target domain sample set to perform distribution-aware, progressive self-training of the fine-tuned joint training architecture, the value of unlabeled sample data in the target domain is fully exploited. During self-training, the teacher network predicts massive amounts of unlabeled samples, and a dynamic filtering strategy is used to select unlabeled samples and generate pseudo-labels, effectively identifying rare, long-tailed data. The pseudo-labeled samples are then mixed with labeled samples from the target domain and input into the student network for end-to-end closed-loop iterative optimization, continuously updating the student network parameters. The teacher network parameters are smoothly updated by the student network using an exponential moving average, filtering out noise gradients caused by erroneous pseudo-labels. This process continuously optimizes the classification boundary of the teacher network until the recognition accuracy of the student network on the target domain sample set stabilizes, indicating model convergence.

[0143] After the joint training architecture converges through progressive self-training, the trained architecture is streamlined by removing the teacher network dedicated to training. Simultaneously, all redundant training auxiliary modules in the inference stage, such as the EMA prototype momentum storage matrix, topological knowledge distillation loss calculation nodes, and pseudo-label filtering modules, are removed, retaining only the complete forward propagation path of the student network. This lightweight model retains only the structure required for food recognition forward inference, significantly reducing the model's computational overhead and memory usage.

[0144] Combination Figure 6 Optionally, in S202, the processor, based on topology-preserving relation knowledge distillation, fine-tunes the initialized joint training architecture using labeled samples from the target domain sample set, including:

[0145] S222, the processor constructs cross-entropy loss and prototype contrast loss based on the prototype center of the stabilized dual Riemannian manifold processing module.

[0146] S223, the processor distills based on topology-preserving relation knowledge and constructs distillation loss.

[0147] In S224, the processor uses the weighted sum of cross-entropy loss, prototype contrast loss, and distillation loss as the total loss function. Based on the total loss function, it executes the backpropagation algorithm to calculate the joint gradient with respect to the weights of each layer of the student network in order to update the student network.

[0148] Here, the feature discrimination ability of the student network is enhanced by combining prototype clustering loss and classification loss, and the source domain general knowledge of the student network is preserved by knowledge distillation based on topology preservation relationship. This effectively avoids overfitting and catastrophic forgetting in the process of fine-tuning with small samples, while adapting to the domain shift problem caused by complex lighting and variable shooting distances inside smart terminals, and reducing the dependence on labeled sample data of the target domain.

[0149] As described above, a stable dual-Riemannian manifold processing module is obtained. Using the class prototype centers of this stable module as a metric, cross-entropy loss and prototype contrast loss are constructed to dual-constrain the feature learning performance of the student network. The prototype contrast loss is constructed based on the distance between the dual-space features output by the dual-Riemannian manifold processing module and the corresponding class prototype centers. L proto :

[0150] .

[0151] in, L proto This represents the prototype contrast loss, used to constrain features of similar food items to cluster towards the corresponding prototype center. This represents the L2 norm distance in Euclidean space, used to calculate the distance between a feature in Euclidean space and the center of the corresponding class prototype. Represents the hyperbolic geodesic distance in hyperbolic space, used to calculate the distance between a hyperbolic space feature and the corresponding class prototype center. y i This represents the true class label of the i-th sample.

[0152] The classification layer of the student network, which uses the predicted probabilities of a prototype Softmax classifier based on Riemann metric, constructs a cross-entropy loss with the sample's true label. Thus, the prototype clustering loss and the cross-entropy loss form a dual constraint. The prototype clustering loss ensures clustering of similar classes and separation of dissimilar classes at the feature level, reducing feature confusion caused by the domain shift of smart terminals. The cross-entropy loss optimizes classification accuracy at the prediction result level, adapting to the accurate identification of multiple types of food in smart terminal scenarios. The two work synergistically to effectively improve the student network's ability to identify and classify food features in complex smart terminal environments, reducing classification errors.

[0153] To preserve the general source domain knowledge carried by the teacher network and avoid overfitting and catastrophic forgetting in the student network during fine-tuning with few samples, a distillation loss is constructed based on relation-based knowledge distillation (RKD). Labeled samples from the target domain are simultaneously input into both the teacher and student networks. The topological relationships of the source domain features output by the teacher network and the topological relationships of the target domain features output by the student network are obtained, respectively. A relative knowledge distillation loss is used to measure the difference in the topological relationships of the features between the student and teacher networks. This ensures that the student network retains the topological structure of the source domain features while learning the target domain features. During the fine-tuning phase, the teacher network's parameters are frozen, serving only as topological guidance and not participating in gradient backpropagation, ensuring that its source domain knowledge is not interfered with by small samples in the target domain.

[0154] The cross-entropy loss, prototype contrast loss, and distillation loss are weighted and summed to construct the total loss function. Based on the total loss function, the backpropagation algorithm is executed to calculate the joint gradient of the total loss with respect to the weights of each layer in the student network. The gradient descent algorithm is used to iteratively update the weights of each layer in the student network along the reverse direction of the gradient. After each iteration, the total loss value is calculated until the total loss value is continuously lower than the preset loss threshold multiple times. At this point, the model fine-tuning is considered to have converged, completing the fine-tuning process for labeled samples in the target domain.

[0155] Optionally, in step S222, the processor constructs a cross-entropy loss based on the class prototype center of the stabilized dual Riemannian manifold processing module, including:

[0156] The processor calculates the Euclidean L2 distance and hyperbolic geodesic distance between the sample bispatial features and the corresponding stable class prototype center.

[0157] The processor is based on the Riemannian metric prototype Softmax classifier, which converts Euclidean L2 distance and hyperbolic geodesic distance into classification prediction probabilities through a negative exponential distribution.

[0158] The processor calculates the cross-entropy based on the predicted probability and the true label of the sample, and obtains the cross-entropy loss.

[0159] Here, the dual-space features include Euclidean space features and hyperbolic space features. The Euclidean L2 distance and hyperbolic geodesic distance between the sample's dual-space features and the corresponding stable class prototype center are calculated separately to ensure that the distance metrics conform to the geometric rules of the bi-Riemannian manifold. A prototype Softmax classifier based on Riemannian metrics is used to convert the calculated Euclidean L2 distance and hyperbolic geodesic distance into classification prediction probabilities through a negative exponential distribution, strengthening the correlation between samples of the same class and their corresponding prototype centers and suppressing interference from samples of different classes. Among these, the sample... x i Predicted probability of belonging to category k The negative exponential distribution for calculating the distance between bi-space features and class prototypes is as follows:

[0160] .

[0161] Here, T is a temperature coefficient, a hyperparameter used to scale the spatial distance between the two Riemannian manifolds and smooth the classification probability distribution, thereby improving the stability and generalization ability of the model under small sample size and pseudo-label training. Based on the classification prediction probability and the true class label of the sample, the cross-entropy value is calculated to obtain the cross-entropy loss. L CE .

[0162] .

[0163] in, Let K be the predicted probability that the i-th sample belongs to the k-th class, where K is the total number of food categories.y i,k Let the true class label of the i-th sample be the class k to which it belongs. When sample i belongs to class k... y i,k =1, when sample i does not belong to the k-th class y i,k =0; z i,k This represents the original output of the classification layer for the i-th sample belonging to the k-th class. z i,j This is the original output of the classification layer for the i-th sample belonging to the j-th class; B For batch size, This represents the summation of all samples within the current training batch, to account for the cross-entropy loss. L CE The loss is on the same order of magnitude as the prototype mentioned above.

[0164] Thus, by combining dual spatial distance metrics with the Riemannian metric prototype Softmax classifier, the cross-entropy loss can accurately adapt to the feature output of the dual Riemannian manifold processing module. This avoids classification bias caused by a single spatial distance metric, and strengthens the correlation between similar samples and their corresponding prototype centers through negative exponential distribution transformation. This improves the optimization effect of cross-entropy loss on classification results, adapts to the complex feature distribution in cross-domain small-sample scenarios on smart terminals, and further enhances the model's classification accuracy.

[0165] Combination Figure 7 Optionally, in S223, the processor constructs a distillation loss based on topology-preserving relation knowledge distillation, including:

[0166] S2231, the processor inputs the current training batch samples into the student network to obtain the student feature set; and inputs the current training batch samples into the teacher network to obtain the teacher feature set.

[0167] S2232, the processor calculates the Euclidean spatial distance and hyperbolic geodesic distance of any sample pair based on the teacher feature set and the student feature set, respectively, to construct the distance distillation loss.

[0168] S2233, the processor calculates the cosine similarity of any sample to unit features based on the teacher feature set and the student feature set, and constructs the angular distillation loss.

[0169] S2234, the processor sums the distance distillation loss and the angle distillation loss to obtain the distillation loss.

[0170] Here, through the synergistic effect of dual spatial distance distillation and angular distillation, the constructed topological constraint loss, namely distillation loss, can accurately constrain the characteristic topological structure of the student network to remain consistent with that of the teacher network, ensuring robust transfer of general knowledge from the source domain and avoiding overfitting and catastrophic forgetting in the cross-domain small sample fine-tuning of the student network on the smart terminal. At the same time, it adapts to the feature distribution characteristics of the dual Riemannian manifold, further improving the cross-domain generalization ability of the model and meeting the needs of food classification in the complex environment inside the smart terminal.

[0171] Specifically, labeled samples from the target domain of the current training batch are simultaneously input into both the student network and the teacher network, and feature sets output by the two types of networks are obtained respectively. Among these, the student feature set... F S For student characteristics f S The set of teacher characteristics F T Teacher characteristics f T The set of features. Based on the teacher feature set and the student feature set, all sample pairs are traversed to calculate the Euclidean L2 distance in Euclidean space and the hyperbolic geodesic distance in hyperbolic space. The distance distillation loss is constructed by combining the two types of distances. L R-dist Simultaneously, based on the teacher and student feature sets, the dual-space features of each sample are normalized, and then the cosine similarity between unit features is calculated. An angular distillation loss is constructed based on the similarity difference. L R-angle Summing the distance distillation loss and the angle distillation loss yields the topological constraint loss. L RKD = L R-dist + L R-angle .

[0172] Optionally, in step S2232, the processor calculates the Euclidean spatial distance and hyperbolic geodesic distance for any sample pair based on the teacher feature set and the student feature set, respectively, to construct the distance distillation loss, including:

[0173] The processor calculates the Euclidean distance and hyperbolic geodesic distance between any pair of samples in the teacher feature set and the student feature set, respectively.

[0174] The processor normalizes the calculated Euclidean space distance and hyperbolic geodesic distance by means to obtain the normalized Euclidean space relative distance and hyperbolic geodesic relative distance.

[0175] The processor uses the Huber loss function to constrain the consistency of the distribution of normalized Euclidean space relative distance and hyperbolic geodesic relative distance, and constructs distance distillation loss.

[0176] Here, based on the dual geometric metric rules of Euclidean and hyperbolic spaces, a three-step process—bi-spatial distance calculation, mean normalization, and Huber loss joint constraint—precisely quantifies the topological deviation of the teacher-student network in the relative distance of bi-spatial features. This topological deviation only constrains the relative scale relationship of the student network's bi-spatial features to remain consistent with that of the teacher network, ensuring that the student network fully retains the bi-spatial feature scale topology learned during pre-training in the source domain when adapting to the target domain's intelligent terminal environment.

[0177] In detail, calculate the features of any sample in the teacher feature set separately. i , j The Euclidean spatial distance and hyperbolic geodesic distance between pairs of samples, and any sample pair in the student feature set ( i , j The Euclidean spatial distance and hyperbolic geodesic distance of the sample pair features are calculated. The standard L2 norm is used to calculate the Euclidean spatial distance of the sample pair features. Based on the Poincaré sphere model, Möbius algebra, and the hyperbolic geodesic distance formula, the shortest path distance of the sample pair features along the hyperbolic manifold surface is calculated. Wherein, any sample pair ( i , j The Euclidean distance of ) is ; any sample pair ( i , j The hyperbolic geodesic distance is .

[0178] Calculate the batch mean of the Euclidean distance for the teacher and student networks respectively. μ T,euc , μ S,euc and the average distance of hyperbolic geodesic lines from the batch μ T,hyp , μ S,hyp The bispatial distances are normalized to eliminate global scale differences in the teacher-student network, yielding normalized Euclidean spatial relative distances and normalized hyperbolic geodesic relative distances. Taking the teacher network as an example, the formulas for calculating the corresponding normalized Euclidean spatial relative distances and normalized hyperbolic geodesic relative distances are as follows:

[0179] , .

[0180] in, Teacher network samples with Euclidean distance, Teacher network sample for hyperbolic geodesic distance; , These are the normalized Euclidean distance and hyperbolic geodesic distance, respectively. After normalization, the relative distances between the teacher and student networks are on the same scale, ensuring that the calculation of topological bias is not distorted.

[0181] The Huber loss function is used to constrain the consistency of the distribution of normalized Euclidean space relative distances and hyperbolic geodesic relative distances, thus constructing a distance distillation loss function. L R-dist Among them, distance distillation loss for:

[0182] .

[0183] μ S μ T These are the mean batch feature distances for student and teacher networks, respectively. Huber loss function The hyperparameters are determined by this loss. This loss forces the student network to maintain a high degree of consistency with the teacher network in the relative distance distribution of features in both Euclidean and hyperbolic spaces, achieving synchronous inheritance of topology across both spatial scales. This ensures fine-grained optimization with minimal distance deviation while avoiding loss oscillations caused by outliers.

[0184] Optionally, S2233, the processor calculates the cosine similarity of any sample to a unit feature based on the teacher feature set and the student feature set, respectively, and constructs an angular distillation loss, including:

[0185] The processor normalizes the bispace features in the teacher feature set and the student feature set to obtain unit bispace features.

[0186] The processor calculates the cosine similarity between any sample in the teacher feature set and the student feature set and the unit bispace feature, respectively.

[0187] The processor uses the mean squared error loss function to constrain the distribution consistency of cosine similarity and constructs an angular distillation loss.

[0188] Here, by normalizing the dual-space features, the angular measurement bias caused by different feature scales is eliminated, ensuring the accuracy of cosine similarity calculation. The mean squared error loss function can accurately constrain the distribution consistency of the cosine similarity of teacher and student features, strengthen the topological constraint of samples on relative angles, and adapt to the feature distribution characteristics of the bi-Riemannian manifold. At the same time, it suppresses the angular bias caused by the fluctuation of sample features in the complex environment of smart terminals.

[0189] In detail, the dual-space features in the student and teacher feature sets are respectively normalized to obtain unit features. Among them, teacher features... Corresponding unit dual-space features Student characteristics Corresponding unit dual-space features The calculation formulas are as follows: , The normalized feature vector has a magnitude of 1, representing only the semantic direction of the feature in the bi-Riemannian manifold space. Based on the unit bi-space features, a set of teacher unit features is obtained. E T Student unit feature set E S It iterates through all unit features in both sets, calculates the cosine similarity between any pair of samples within each set, and fully quantifies the relative semantic direction relationship between features. (Cosine similarity of teacher sample pairs) Cosine similarity of student sample pairs The calculation formulas are as follows: , Iterate through all sample pairs within the training batch. i , j This yields the cosine similarity set of sample pairs for the teacher network and the cosine similarity set of sample pairs for the student network.

[0190] The mean squared error loss function (MSE) is used to constrain the cosine similarity of sample pairs in the teacher-student network. By quantifying the deviation between the two and summing them, an angular distillation loss is constructed. L R angle This method forces the cosine similarity distribution of the student network to be consistent with that of the teacher network, achieving distortion-free inheritance of the feature's topology relative to the semantic direction. The mean squared error loss function can accurately quantify the degree of deviation of continuous values, adapting to the numerical distribution characteristics of cosine similarity, and has low computational complexity, not increasing the computational cost of model training, thus meeting the lightweight training needs of smart edge devices. Angular distillation loss. The calculation formula is:

[0191] .

[0192] The magnitude of this loss value is positively correlated with the topological deviation between the student network and the teacher network in terms of the relative angle of features. The smaller the loss value, the more complete the topological structure of the source domain feature semantic direction preserved by the student network. Thus, by using angular distillation loss to constrain the student network from the dimension of relative semantic direction of features, it ensures that the relative angle between any two food features in the student network remains consistent with that of the teacher network. This prevents the student network from disrupting the semantic association topology of food features learned during pre-training in the source domain when fine-tuning with a very small number of labeled samples from the target domain. The cosine similarity of hyperbolic features is calculated as the Euclidean cosine similarity in the tangent space at the origin. Optionally, when calculating the distillation loss, only the distance distillation loss and angular distillation loss of Euclidean space features can be calculated; that is, the hyperbolic space branch does not participate in the topology-preserving relation knowledge distillation. The hyperbolic space branch can directly constrain the hyperbolic features of the student network to converge towards the teacher prototype through prototype contrast loss, achieving stable replication of the hierarchical structure. This further reduces the model training / inference complexity, and the resulting model is better suited for lightweight deployment on smart edge devices.

[0193] Combination Figure 8 Optionally, in step S203, the processor performs progressive self-training on the fine-tuned joint training architecture using the target domain sample set, including:

[0194] In S231, the processor uses the teacher network to infer unlabeled samples in the target domain sample set in each training iteration to generate candidate pseudo-labels.

[0195] S232, the processor uses a dynamic confidence threshold screening strategy to verify candidate pseudo-labels to obtain highly reliable pseudo-labels; the unlabeled samples with highly reliable pseudo-labels are mixed with labeled samples to form a mixed training set.

[0196] S233, the processor inputs samples from the mixed training set into the fine-tuned joint training architecture for training, updates the student network parameters and smooths the teacher network parameters using an exponential moving average until the recognition accuracy curve of the updated joint training architecture on the target domain sample set converges smoothly.

[0197] Here, after the small-sample fine-tuning converges, to fully exploit the value of environmental data, a closed-loop progressive self-training is performed on the fine-tuned joint training architecture based on a massive amount of unlabeled samples in the target domain. Highly reliable pseudo-labels are generated through dynamic confidence thresholding, and a hybrid training set is constructed for closed-loop iterative optimization. The teacher network parameters are then smoothly updated using exponential moving average (EMA) to avoid pseudo-label noise interference and model overfitting until the model accuracy converges.

[0198] In detail, in each training iteration, a massive number of unlabeled food images from the target domain sample set are input into a teacher network with frozen weights for forward inference. The teacher network, based on a dual Riemannian manifold processing module and a Riemannian metric Softmax classifier, outputs the predicted probability distribution of the unlabeled samples' categories. The category corresponding to the highest probability can be selected as the candidate pseudo-label for that sample. Furthermore, a dynamic confidence threshold filtering strategy is used to verify the candidate pseudo-labels, eliminating those with low confidence or prone to introducing errors, and retaining only highly reliable candidate pseudo-labels. For example, a dynamic filtering rule based on the lowest confidence threshold and / or Top-K ranking within the category can be used. The unlabeled samples with highly reliable pseudo-labels after filtering are concatenated and fused with the labeled samples from the target domain sample set to form a hybrid training set, which serves as the training data for this iteration. The hybrid training set balances a small number of accurately labeled samples with a massive number of highly reliable pseudo-label samples, significantly expanding the scale of the target domain training data without increasing manual annotation costs, thus adapting to the learning scenarios of small samples across domains on smart terminals. The mixed training set is re-input into the fine-tuned joint training architecture, and forward propagation and total loss calculation are performed. Only the student network parameters are updated based on the backpropagation algorithm; the teacher network does not participate in gradient backpropagation and is smoothly updated by the student network parameters using exponential moving average (EMA). The formula for the exponential moving average in this stage is:

[0199] .

[0200] in, These are the teacher network parameters for round t and round t-1, respectively. Let be the student network parameters for round t. α This is the momentum coefficient. This effectively filters out noise gradients caused by false labels, ensuring stable updates to the teacher's network parameters. Repeat the above iterative process to continuously optimize the joint training architecture until the updated joint training architecture's recognition accuracy curve on the target domain sample set becomes stable and without significant fluctuations. At this point, the model is considered converged, and progressive self-training is complete.

[0201] In this way, through closed-loop progressive self-training, without destroying the topological structure of the source domain features or increasing additional computational overhead, the domain offset features of the real-world scenarios of smart terminals are deeply adapted, thereby achieving continuous improvement in model recognition accuracy and robustness.

[0202] Optionally, in step S231, the processor uses the teacher network to infer unlabeled samples in the target domain sample set to generate candidate pseudo-labels, including:

[0203] The processor uses the teacher network to infer the unlabeled samples in the target domain sample set to obtain the predicted probability distribution.

[0204] The processor extracts the maximum probability and its corresponding category from the classification prediction probability distribution, and uses the category corresponding to the maximum probability as the candidate pseudo-label for the unlabeled sample.

[0205] Here, unlabeled food images from the target domain sample set are input into the teacher network for forward inference. The unlabeled samples are sequentially processed by the teacher network's feature extraction module and dual Riemannian manifold processing module to complete feature representation and fusion. Finally, the Riemannian metric Softmax classifier outputs the predicted probability distribution for all food categories corresponding to that sample. Let the unlabeled sample be... x i If the total number of food categories is K, then the predicted probability distribution of the teacher's network output is: This formula represents unlabeled samples as... x i The probability of being predicted as the k-th type of food. The predicted probability distribution output by the teacher network is traversed, and the maximum probability value and the corresponding food category are extracted. This food category corresponding to the maximum probability is used as a candidate pseudo-label for the current unlabeled sample. The maximum probability is calculated as follows: .

[0206] Optionally, in S232, the processor verifies candidate pseudo-labels based on a dynamic confidence threshold filtering strategy, including:

[0207] For any unlabeled sample, the processor obtains the maximum probability value in the corresponding predicted probability distribution. p i .

[0208] when p i ≥ At that time, the processor retains the candidate pseudo-labels for the sample.

[0209] in, The global minimum confidence threshold increases non-linearly with the number of iterations, approaching a high confidence level as the training progresses; t represents the current training round.

[0210] Here, for each unlabeled sample whose candidate pseudo-labels are obtained through teacher network inference, the maximum probability value is extracted from its corresponding predicted probability distribution. p i A global minimum confidence threshold is set that increases non-linearly with the number of iterations. t represents the current training epoch. In the initial self-training phase, the student network has limited adaptability to the target domain features, and the threshold... At a relatively low level, sufficient reliable samples are selected for training. As the number of training rounds t increases, the student network's learning of the target domain's food characteristics deepens, and the prediction accuracy continuously improves, with the threshold... The confidence level gradually increases non-linearly and eventually approaches a preset high confidence level. As an example, The initial value is 0.6.

[0211] The maximum predicted probability value of unlabeled samples p i Dynamic threshold of the current round Compare. If p i ≥ If the threshold is high enough, the candidate pseudo-label is deemed to have sufficient confidence and is retained. Conversely, if the threshold is low enough, the candidate pseudo-label is deemed to have insufficient confidence and a high risk of error, and both the candidate pseudo-label and its corresponding sample are discarded. In this way, the dynamic threshold verification strategy can gradually increase the pseudo-label admission standard as training progresses. This avoids both insufficient usable samples due to excessively high thresholds in the early stages of training and the introduction of low-confidence erroneous labels in the later stages of training, thus preventing confirmation bias and providing continuous and reliable label data support for progressive self-training.

[0212] Optionally, in S232, the processor verifies candidate pseudo-labels based on a dynamic confidence threshold filtering strategy, and further includes:

[0213] The processor checks the confidence level of the unlabeled samples of the retained candidate pseudo-labels relative to the ranking of the confidence level of the unlabeled samples in the current training batch among all samples predicted as the same candidate class.

[0214] When the confidence level ranks in the top K% of similar candidate sets, the processor confirms the corresponding candidate pseudo-label as a highly reliable pseudo-label.

[0215] Where K is a hyperparameter with a fixed proportion.

[0216] Here, based on the global dynamic confidence threshold screening, a dual screening mechanism is formed by combining the category-inherent confidence relative ranking verification with the global minimum confidence threshold and the Top-K ranking within the category. This further improves the reliability of pseudo-labels, alleviates the problem of uneven distribution of food category samples, and also takes into account the sample mining of long-tail rare food categories. The embodiments of this disclosure design a joint mathematical indicator function. The distribution-aware dynamic filter integrates global dynamic confidence threshold constraints and intra-category relative ranking constraints to achieve dual verification of pseudo-labels. (Candidate pseudo-labels) The discriminant assignment rule is as follows:

[0217] .

[0218] in, This represents the i-th sample whose predicted probability ranks in the top K% of category j. argmax This is the index of the largest independent variable.

[0219] In detail, for those that have passed the global minimum confidence threshold Unlabeled samples with candidate pseudo-labels are validated and retained. They are then grouped according to the food category to which their corresponding candidate pseudo-labels belong. All unlabeled samples predicted to belong to the same category are grouped into a single candidate sample set for relative confidence comparison within each category. For each candidate sample set, the maximum predicted probability values ​​of all samples within the set are sorted in descending order to obtain a ranking sequence of sample confidence for that category. A fixed proportion of hyperparameter K is pre-set. In each candidate sample set, only samples with confidence ranking in the top K% are retained, and their candidate pseudo-labels are officially confirmed as highly reliable pseudo-labels. Samples not ranking in the top K% are discarded even if they meet the global confidence threshold and are no longer included in subsequent self-training.

[0220] In this way, by using relative ranking within the categories, we can avoid the situation where there are no samples available for a certain food category due to the overall low confidence level of that category. It also effectively suppresses label noise caused by low-confidence samples within the same category. While ensuring the overall reliability of pseudo-labels, it maintains a relative balance in the number of samples across categories, enabling the student network to learn a balanced approach to common and rare food ingredients during its progressive self-training process.

[0221] Optionally, in step S233, the processor trains the fine-tuned joint training architecture by inputting samples from the mixed training set, and updates the student network parameters, including:

[0222] The processor trains the fine-tuned student network using samples from the mixed training set and recalculates the prototype clustering loss, classification loss, and distillation loss.

[0223] The processor performs a weighted summation of the recalculated prototype clustering loss, classification loss, and distillation loss to form a new total loss function, and calculates the gradient of the new total loss function with respect to the trainable weights of each layer of the student network.

[0224] The processor uses an adaptive optimizer to iteratively update the weight parameters of the feature extraction module, the bi-Riemannian manifold processing module, and the classification layer of the fine-tuned student network.

[0225] Here, the samples from the mixed training set are re-input into the fine-tuned joint training architecture for progressive self-training and updating of the student network parameters, thereby enhancing the student network's target domain adaptation capability and continuously optimizing classification accuracy. This stage continues the dual-constraint optimization logic of the fine-tuning stage, recalculating various losses based on the mixed training set, constructing the total loss function, and completing the gradient update of the student network.

[0226] A mixed training set containing labeled samples from the target domain and unlabeled samples with highly reliable pseudo-labels is input into the fine-tuned joint training architecture, and a complete forward propagation is performed only through the student network. Samples are sequentially processed by the feature extraction module, the bi-Riemannian manifold processing module, and the Riemann metric Softmax classifier to output prediction results. Cross-entropy loss, prototype contrast loss, and distillation loss are recalculated, and these recalculated losses are assigned preset hyperparameter weights. A new total loss function is constructed for the self-training phase through weighted summation. Based on the newly constructed total loss function, the gradient of the total loss function relative to the trainable weights of each layer of the student network is calculated using the backpropagation algorithm. Gradient calculation is performed only on the trainable parameters of the student network; the teacher network in the joint training architecture does not participate in gradient backpropagation but only serves as a reference benchmark for the source domain feature topology, ensuring that gradient updates only affect the target domain adaptation optimization of the student network. An adaptive optimizer, such as the Adaptive Moment Estimator (Adam), can be used to iteratively update all trainable weight parameters of the fine-tuned student network's feature extraction module, bi-Riemannian manifold processing module, and classification layer based on the calculated gradient values. The adaptive optimizer can dynamically adjust the parameter update step size, which improves the accuracy of food classification in the target domain of the student network while maintaining the bi-Riemannian manifold feature representation effect and the source domain feature topology until the student network parameters converge smoothly.

[0227] Optionally, S233, the processor smoothly updates the teacher network parameters using the updated student network parameters via an exponential moving average (EMA), including:

[0228] The processor inputs the parameters from the previous training round of the teacher network and the updated parameters from the current training round of the student network into the exponential moving average calculation formula to smoothly fuse the weights, thus obtaining the updated parameters of the teacher network for the current round.

[0229] The processor freezes the updated teacher network parameters as a benchmark for calculating topological constraint loss in the next round of training.

[0230] Here, the teacher network parameters are smoothly updated using the updated student network parameters through exponential moving average (EMA) to suppress false label noise interference, maintain the stability of teacher network parameters, and ensure the consistency of topology constraints. Parameter smoothing and fusion avoids abrupt changes in teacher network parameters, while the updated teacher network parameters are frozen as the topology constraint benchmark, ensuring that the student network always optimizes with stable source domain topology characteristics.

[0231] The fixed parameters of the teacher network after the previous training round are simultaneously input into the formula for calculating the exponential moving average, along with the parameters of the student network updated by the gradient of the total loss function in the current round. This process smoothly merges the two types of parameters to obtain the updated parameters of the teacher network in the current round. The formula for calculating the exponential moving average is given above. The momentum coefficient α can be set to 0.999, which effectively filters out noisy gradients caused by false-label samples, ensuring smooth iteration of the teacher network parameters.

[0232] The parameters of the teacher network, obtained through exponential moving average smoothing and fusion, are weighted and frozen after the current round, preventing them from participating in gradient backpropagation in the current and next training rounds. These frozen teacher network parameters will serve as the sole benchmark for calculating the topological constraint loss in the next training iteration, used to extract stable feature topological relationships and compare their relative geometric relationships with those of the student network features. This constrains the student network to continuously retain the robust feature topological structure learned during pre-training in the source domain, preventing the topological constraints from failing due to drastic fluctuations in the teacher network parameters, and ensuring the stability and convergence of the cross-domain small-sample self-training process.

[0233] Optionally, the total loss function L total for .

[0234] in, λ 1 represents the prototype comparison loss weight. λ 2 represents the distillation loss weight. Cross-entropy loss. L CE The correctness of ingredient classification is directly determined by the core optimization objective and principal loss of the student network, hence its weight is 1. During both the fine-tuning and progressive self-training phases, the total loss function uses the function from the formula above.

[0235] Combination Figure 9 Optionally, in step S204, after the processor has progressively self-trained to convergence, after removing the teacher network from the joint training architecture and retaining only the forward propagation path of the student network, the process further includes:

[0236] S205, the processor inputs the samples from the calibration set into the retained student network to complete the forward propagation, statistically analyzes the numerical distribution characteristics of the weights and activation values ​​of each convolutional layer and fully connected layer of the student network, and calculates the quantization scaling factor and zero-point drift value of each layer.

[0237] S206, the processor performs lossless quantization transformation on the weights and activation values ​​of the retained student network based on the affine quantization mapping formula, combined with the quantization scaling factor and zero-point drift value, compressing the floating-point numbers of the weights and activation values ​​into integers; thus obtaining a lightweight student network model.

[0238] The calibration set is composed of samples selected from the target domain dataset.

[0239] Here, lossless quantization based on the target domain data distribution is performed on the retained student network to further compress the model size and reduce inference computational consumption, enabling the final model to efficiently adapt to the low-storage, low-computing-power operating environment of embedded hardware at the edge of smart terminals. Specifically, representative food image samples with scene, category, and pose characteristics are selected from the target domain dataset to form a calibration set. This calibration set, derived from the target domain sample set, ensures that subsequent quantization parameters conform to the target domain features, avoiding large quantization errors introduced by differences in data distribution. The samples in the calibration set are input into the student network that retains only the forward propagation path, and a complete forward propagation is completed. During inference, the weight distribution of each convolutional layer and fully connected layer, as well as the activation value distribution characteristics of the forward output of each layer, are statistically analyzed, including the value range, extreme values, and distribution intervals. Based on the statistically obtained numerical distribution characteristics, the quantization scaling factor and zero-point drift value corresponding to each layer are calculated.

[0240] Based on a predefined affine quantization mapping formula, and combined with the independent quantization scaling factor S and zero-point drift value Z for each layer, lossless quantization transformation is performed on the retained student network weights and activation values, compressing the weights and activation values, originally stored as floating-point numbers, into integer data. The predefined affine quantization mapping formula is as follows:

[0241] .

[0242] Where q is the quantization value and r is the floating-point number. round The rounding function is used; the scaling factor S and the zero-point drift value Z are both non-fixed hyperparameters calculated by statistically analyzing the data distribution of each layer of the student network using the target domain calibration set (food image samples from real-world scenes on smart terminals).

[0243] This quantization process significantly reduces the model's storage space and the amount of floating-point operations at the edge, while ensuring that the accuracy of food identification is basically unaffected. This allows the lightweight student network model to be directly deployed on embedded chips in smart terminals, enabling stable and fast localized food identification.

[0244] When training the lightweight student network model, the network environment and hyperparameters are initialized. To balance the model's representational power with the computational constraints of subsequent edge devices, the batch size for training in this example is configured to 32, the adaptive moment estimation (Adam) optimizer is used to optimize the network weights, and the initial learning rate is set to 5 × 10⁻⁶. -4 And introduce a weight decay coefficient of 1×10 -4 By penalizing the L2 norm of the parameter matrix, overfitting of the model can be suppressed.

[0245] Combination Figure 10 As shown, this disclosure provides a cross-domain few-sample object image classification device 100 for a smart terminal, including a processor 101 and a memory 102. Optionally, the device may further include a communication interface 103 and a bus 104. The processor 101, communication interface 103, and memory 102 can communicate with each other via the bus 104. The communication interface 103 can be used for information transmission. The processor 101 can call logical instructions in the memory 102 to execute the cross-domain few-sample object image classification method for a smart terminal described in the above embodiment.

[0246] Furthermore, the logical instructions in the aforementioned memory 102 can be implemented as software functional units and, when sold or used as independent products, can be stored in a computer-readable storage medium.

[0247] The memory 102, as a computer-readable storage medium, can be used to store software programs and computer-executable programs, such as program instructions / modules corresponding to the methods in the embodiments of this disclosure. The processor 101 executes functional applications and data processing by running the program instructions / modules stored in the memory 102, that is, it implements the cross-domain small sample object image classification method for smart terminals in the above embodiments.

[0248] The memory 102 may include a program storage area and a data storage area. The program storage area may store the operating system and applications required for at least one function; the data storage area may store data created based on the use of the terminal device. Furthermore, the memory 102 may include high-speed random access memory and may also include non-volatile memory.

[0249] Combination Figure 11 As shown, this disclosure provides a smart terminal 200, including: a smart terminal body, with a built-in image acquisition device for acquiring object images; and the aforementioned cross-domain small-sample object image classification device 100 for the smart terminal. The cross-domain small-sample object image classification device 100 for the smart terminal is installed in the smart terminal 200. The installation relationship described herein is not limited to placement inside the smart terminal body, but also includes installation connections with other components of the smart terminal 200, including but not limited to physical connections, electrical connections, or signal transmission connections. Those skilled in the art will understand that the cross-domain small-sample object image classification device 100 for the smart terminal can be adapted to feasible smart terminal bodies to achieve other feasible embodiments. Optionally, the smart terminal includes a smart freezer, a smart refrigerator, or a smart preservation cabinet.

[0250] This disclosure provides a computer-readable storage medium storing computer-executable instructions configured to perform the above-described cross-domain small sample object image classification method for smart terminals.

[0251] The technical solutions of this disclosure can be embodied in the form of a software product. This computer software product is stored in a storage medium and includes one or more instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute all or part of the steps of the method described in this disclosure. The aforementioned storage medium can be a non-transitory storage medium, such as a USB flash drive, external hard drive, read-only memory (ROM), random access memory (RAM), magnetic disk, or optical disk, or other media capable of storing program code.

[0252] The foregoing description and accompanying drawings fully illustrate embodiments of this disclosure to enable those skilled in the art to practice them. Other embodiments may include structural, logical, electrical, procedural, and other changes. The embodiments represent only possible variations. Individual components and functions are optional unless explicitly required, and the order of operation may vary. Parts and features of some embodiments may be included in or replace parts and features of other embodiments. Moreover, the terminology used in this application is for describing embodiments only and is not intended to limit the claims. As used in the description of embodiments and claims, the singular forms “a,” “an,” and “the” are intended to equally include the plural forms unless the context clearly indicates otherwise. Similarly, the term “and / or” as used in this application means including one or more of the associated listed items and all possible combinations thereof. Additionally, when used in this application, the term "comprise" and its variations "comprises" and / or "comprising" refer to the presence of stated features, integrals, steps, operations, elements, and / or components, but do not exclude the presence or addition of one or more other features, integrals, steps, operations, elements, components, and / or groups thereof. Without further limitations, an element defined by the phrase "comprises a..." does not exclude the presence of other identical elements in the process, method, or apparatus that includes said element. In this document, each embodiment may focus on the differences from other embodiments, and similar or identical parts between embodiments can be referred to mutually. For methods, products, etc., disclosed in the embodiments, if they correspond to the method section disclosed in the embodiments, the relevant parts can be referred to the description of the method section.

[0253] Those skilled in the art will recognize that the units and algorithm steps of the various examples described in conjunction with the embodiments disclosed herein can be implemented in electronic hardware, or a combination of computer software and electronic hardware. Whether these functions are implemented in hardware or software depends on the specific application and design constraints of the technical solution. Those skilled in the art can use different methods to implement the described functions for each specific application, but such implementation should not be considered beyond the scope of the embodiments of this disclosure. Those skilled in the art will clearly understand that, for the sake of convenience and brevity, the specific working processes of the systems, devices, and units described above can be referred to the corresponding processes in the foregoing method embodiments, and will not be repeated here.

[0254] The methods and products (including but not limited to devices and equipment) disclosed in the embodiments herein can be implemented in other ways. For example, the device embodiments described above are merely illustrative. For instance, the division of units may be merely a logical functional division, and in actual implementation, there may be other division methods. For example, multiple units or components may be combined or integrated into another system, or some features may be ignored or not executed. In addition, the coupling or direct coupling or communication connection shown or discussed between each other may be through some interfaces, and the indirect coupling or communication connection between devices or units may be electrical, mechanical, or other forms. The units described as separate components may or may not be physically separate. The components shown as units may or may not be physical units, that is, they may be located in one place or distributed across multiple network units. Some or all of the units may be selected to implement this embodiment according to actual needs. In addition, the functional units in the embodiments of this disclosure may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit.

[0255] The flowcharts and block diagrams in the accompanying drawings illustrate the architecture, functionality, and operation of possible implementations of systems, methods, and computer program products according to embodiments of this disclosure. In this regard, each block in a flowchart or block diagram may represent a module, segment, or portion of code containing one or more executable instructions for implementing a specified logical function. In some alternative implementations, the functions marked in the blocks may occur in a different order than that shown in the drawings. For example, two consecutive blocks may actually be executed substantially in parallel, and they may sometimes be executed in reverse order, depending on the functions involved. In the descriptions corresponding to the flowcharts and block diagrams in the accompanying drawings, the operations or steps corresponding to different blocks may also occur in a different order than disclosed in the description, and sometimes there is no specific order between different operations or steps. For example, two consecutive operations or steps may actually be executed substantially in parallel, and they may sometimes be executed in reverse order, depending on the functions involved. Each block in a block diagram and / or flowchart, and combinations of blocks in a block diagram and / or flowchart, can be implemented using a dedicated hardware-based system that performs the specified function or action, or using a combination of dedicated hardware and computer instructions.

Claims

1. A method for cross-domain few-shot object image classification of intelligent terminals, characterized in that, include: Acquire images of the food ingredients to be identified, and process these images to extract basic feature vectors. The basic feature vectors are mapped using the double Riemannian manifold processing module to obtain Euclidean space features and hyperbolic space features respectively; By integrating Euclidean space features and hyperbolic space features, a comprehensive feature representation for food classification is obtained; The food classification results are determined based on the comprehensive feature representation; The bi-Riemannian manifold processing module includes Euclidean space branches and hyperbolic space branches; the bi-Riemannian manifold processing module is trained in the following way: Obtain a cross-domain food image sample set, which includes source domain food images and target domain food images; Based on a cross-domain food image sample set, corresponding class prototype centers are constructed in Euclidean space (within the Euclidean space branch) and hyperbolic space (within the hyperbolic space branch), and the prototype centers of each class are updated. Specifically, the class prototype centers corresponding to each food category in Euclidean space and hyperbolic space are initialized. Based on the features of the same type of food samples in the current training batch, the mean of the corresponding category features is calculated. Based on the calculated mean of the corresponding category features and the exponential moving average, the class prototype centers corresponding to the Euclidean space are smoothly updated. The Möbius summation method and the exponential moving average are used in combination with the calculated mean of the corresponding category features to update the class prototype centers corresponding to the hyperbolic space. Based on the updated prototype centers and the prototype contrast loss corresponding to Euclidean space and hyperbolic space, the gradient is calculated and backpropagated to optimize the network parameters of the dual Riemannian manifold processing module. The bi-Riemannian manifold processing module is iteratively optimized until the loss converges, resulting in the trained bi-Riemannian manifold processing module.

2. The method of claim 1, wherein, The hyperbolic space branch adopts a Poincaré sphere model with negative curvature; the curvature is constant.

3. The method of claim 2, wherein, The fundamental feature vectors are mapped using a bi-Riemannian manifold processing module to obtain hyperbolic space features; including: A linear transformation is performed on the basic eigenvectors to obtain the hyperbolic tangent space vectors; The hyperbolic tangent space vectors are projected onto the Poincaré sphere model using the Riemann index mapping to obtain the initial hyperbolic space features; The initial hyperbolic space features are corrected by algebraic operations based on the Möbius method to obtain the final hyperbolic space features.

4. The method of claim 1, wherein, Initialize the prototype centers for each food category in Euclidean and hyperbolic space, including: Initialize the prototypical centers of each category in Euclidean space as zero vectors or small-scale random vectors; The prototype centers of each category in hyperbolic space are initialized as vectors around the origin of the Poincaré sphere model, ensuring that the initial prototype centers are constrained within the hyperbolic manifold.

5. The method according to claim 1, characterized in that, The standard numerical average and exponential moving average are used to smoothly update the corresponding class prototype centers in Euclidean space, including: Preset the weighting coefficients for the exponential moving average; The class prototype center of the current training batch is updated by weighted fusion of the historical prototype center of the previous iteration and the feature mean of the same type of samples in the current training batch.

6. The method according to claim 1, characterized in that, The class prototype centers corresponding to the hyperbolic space are updated using the Möbius summation method and exponential moving average, including: Preset the weighting coefficients for the exponential moving average; The Möbius method is used to weight and fuse the historical prototype centers of the previous iteration with the feature mean of the same type of samples in the current batch, so as to update the prototype centers of the hyperbolic space of the current training batch.

7. The method according to claim 1, characterized in that, Based on updated prototype centers and the corresponding prototype contrast loss in Euclidean and hyperbolic spaces, gradients are calculated and backpropagated to optimize the network parameters of the dual Riemannian manifold processing module, including: The updated class prototype center is used as the measurement benchmark; In Euclidean space, the L2 norm is used to calculate the spatial distance between the Euclidean space features of the sample and the corresponding class prototype center. In hyperbolic space, the hyperbolic spatial distance between the hyperbolic spatial features of the sample and the corresponding class prototype center is calculated using hyperbolic geodesic distance; Based on the spatial distance calculated in each space, prototype contrast loss for Euclidean space and prototype contrast loss for hyperbolic space are constructed respectively to form a joint prototype contrast loss. The network parameters of the dual Riemannian manifold processing module are optimized by minimizing the prototype contrast loss through backpropagation.

8. The method according to claim 7, characterized in that, In hyperbolic space, the spatial distance between the hyperbolic spatial features of a sample and the corresponding class prototype center is calculated using hyperbolic geodesic distance, including: The relative vector between the hyperbolic space features of the sample and the corresponding class prototype center is calculated using the Möbius summation method; Calculate the Euclidean norm of the relative vectors, and substitute the Euclidean norm into the hyperbolic geodesic distance formula to calculate the spatial distance between the hyperbolic spatial features of the sample and the corresponding class prototype center.

9. The method according to claim 7, characterized in that, Iterative optimization until loss convergence yields the trained bi-Riemannian manifold processing module, including: After each iteration completes the class prototype center update and network parameter optimization, the total loss value of Euclidean space prototype contrast loss and hyperbolic space prototype contrast loss is calculated. When the total loss value is lower than the preset loss threshold for multiple consecutive iterations, the iteration is stopped and the trained bi-Riemannian manifold processing module is obtained.

10. A cross-domain small-sample object image classification device for a smart terminal, comprising a processor and a memory storing program instructions, characterized in that, The processor is configured to execute, when running the program instructions, the cross-domain few-sample object image classification method for a smart terminal as described in any one of claims 1 to 9.