An AI-based learning system

By building an AI-based learning system and utilizing multi-source data and dynamic adaptive algorithms, the problem of performance degradation of machine vision recognition systems in dynamic environments has been solved, enabling the system to self-optimize and continuously learn, thereby improving recognition accuracy and operational efficiency.

CN122242585APending Publication Date: 2026-06-19HANGZHOU ELECTRIC EQUIP MFG +2

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
HANGZHOU ELECTRIC EQUIP MFG
Filing Date
2026-01-26
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

Existing machine vision recognition systems cannot self-optimize in dynamic environments, cannot adapt to changes in lighting, background interference, and the appearance of new objects, resulting in high maintenance costs and a lack of assessment of their own predictive reliability, as well as low human-machine collaboration efficiency.

Method used

By employing modules for collaborative perception and data acquisition, incremental learning and model management, online recognition and uncertainty assessment, human-machine collaborative annotation and feedback correction, and dynamic scene database and knowledge graph, an AI-based learning system is constructed to achieve an automated closed loop of perception-recognition-evaluation-feedback-learning. The system is self-optimized through a hybrid approach combining Monte Carlo Dropout and knowledge distillation with elastic weight consolidation.

Benefits of technology

The system can proactively adapt to environmental changes, continuously improve performance, reduce operation and maintenance costs, achieve continuous learning and high-precision identification, and has traceable decision-making basis and strong engineering implementation capabilities.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122242585A_ABST
    Figure CN122242585A_ABST
Patent Text Reader

Abstract

This invention relates to the fields of artificial intelligence and machine vision technology, and discloses an AI-based learning system, comprising: a collaborative perception and data acquisition module; an incremental learning and model management module; an online recognition and uncertainty assessment module; a human-machine collaborative annotation and feedback correction module; a dynamic scene database and knowledge graph module; and a system bus and data management module. This AI-based learning system acquires multi-source data through the collaborative perception module and utilizes an online recognition module integrating the Monte Carlo Dropout method for uncertainty quantification and dynamic threshold judgment. It can proactively identify its own cognitive boundaries, accurately filter out difficult samples, and trigger a human-machine collaborative annotation process. Correction feedback from human experts is immediately absorbed by the system as high-quality training data, thus forming an automated closed loop of perception-recognition-evaluation-feedback-learning, solving the problem of traditional static models failing due to scene changes.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the fields of artificial intelligence and machine vision technology, specifically to an AI-based learning system. Background Technology

[0002] Machine vision refers to a technological system that uses optical devices such as cameras and non-contact sensors to automatically receive and process images or video streams of real scenes, and uses artificial intelligence and computer algorithms to extract, analyze and understand key information in order to simulate human visual perception.

[0003] Currently, existing machine vision recognition systems mainly adopt a static training-deployment model, which involves training models on fixed datasets in an offline environment and then directly applying them to real-world scenarios. This model often suffers from drawbacks such as the system's inability to adapt to dynamic environmental conditions, including changes in lighting, background interference, and the appearance of new objects. Once deployed, the model's knowledge becomes fixed, making it impossible to optimize itself using new data generated during operation. When recognition performance degrades, it requires manual data collection, labeling, and full model training, resulting in high maintenance costs and long response cycles. Furthermore, traditional systems lack mechanisms for evaluating their own predictive reliability, cannot identify cognitive boundaries, and lack the ability to proactively seek guidance from human experts, leading to low efficiency in human-machine collaboration. Summary of the Invention

[0004] The purpose of this invention is to provide an AI-based learning system to solve the problems mentioned in the background section.

[0005] To address the aforementioned technical problems, the present invention provides the following technical solution: an AI-based learning system, comprising: a collaborative perception and data acquisition module, used to connect and manage multiple visual sensors and acquire raw environmental perception data; The incremental learning and model management module is used to host and update the core recognition model; The online identification and uncertainty assessment module is used to identify real-time or stored data and assess the reliability of the identification results. The human-computer collaborative annotation and feedback correction module is used to provide a human-computer interaction interface and receive correction information for difficult samples; The dynamic scene database and knowledge graph module is used to store data throughout the entire system operation process and to build semantic relationships between objects; The system bus and data management module is responsible for the data flow and scheduling between all modules.

[0006] Preferably, the collaborative perception and data acquisition module specifically includes: a multi-source sensor interface unit that supports access to RGB cameras, depth cameras, and infrared thermal imagers, an active perception control unit, and a data preprocessing unit. The active perception control unit adjusts the gimbal pose or camera focal length according to the confidence level output by the online recognition and uncertainty assessment module through a preset strategy to obtain multi-view or clearer images of the target. The data preprocessing unit performs automatic white balance, noise reduction, and standardized size cropping on the acquired raw images, and binds the processed image data with timestamps, sensor IDs, and acquisition parameter metadata, storing it in the dynamic scene database and knowledge graph module.

[0007] Preferably, the core of the incremental learning and model management module is a master recognition model based on the fusion of convolutional neural networks and attention mechanisms. The incremental learning algorithm adopts a hybrid method that combines knowledge distillation and elastic weight consolidation. The incremental learning and model management module also includes a model version repository for saving and rolling back historical models.

[0008] Preferably, during inference, the online identification and uncertainty assessment module, in addition to outputting the category probability distribution, uses the Monte Carlo Dropout method for uncertainty assessment: during the testing phase, the same input sample is forward-propagated multiple times, with some neurons randomly discarded each time, thus obtaining multiple softmax probability distributions; the average entropy or probability variance of these distributions is calculated as a measure of cognitive uncertainty; at the same time, the maximum softmax probability output by the model is used as a reference for accidental uncertainty; when the overall uncertainty score exceeds the adaptive threshold or the highest category probability is lower than the confidence threshold, the sample is marked as a difficult sample.

[0009] Preferably, the human-machine collaborative annotation and feedback correction module provides two interactive interfaces: Web and augmented reality. It integrates an active learning strategy. The human-machine collaborative annotation and feedback correction module records all user correction operations and uses them and corresponding samples as high-quality positive and negative sample pairs, assigns them the highest priority weight, and sends them to the incremental learning and model management module for the next round of fine-tuning training.

[0010] Preferably, the dynamic scene database and knowledge graph module adopts a hierarchical storage structure, including: a raw data layer, a labeled data layer, and a log event layer. It uses a graph database for storage and related queries, providing contextual reasoning support for the recognition results.

[0011] Preferably, the main recognition model uses ResNet-50 or EfficientNet-B3 as the feature extraction backbone network, and then cascades a feature pyramid network structure to achieve multi-scale feature fusion; the model output head is dynamically expanded according to the task, using a fully connected layer for classification tasks and a region proposal network and bounding box regressor for detection tasks.

[0012] Preferably, the adaptive threshold is set using a dynamic statistical method. In the initial stage, the system uses a preset fixed threshold. After running for a period of time, the distribution of confidence scores of all recognition results is periodically statistically analyzed. The 5th percentile or mean of the confidence score distribution minus twice the standard deviation is set as the new confidence threshold, and the 95th percentile of the uncertainty score distribution is set as the new uncertainty threshold, thereby realizing the adaptive adjustment of the threshold as the system performance changes.

[0013] Preferably, the system bus and data management module are implemented as an asynchronous communication architecture based on message queues, adopting a publish-subscribe pattern. Each module acts as an independent service, triggering corresponding operations by subscribing to messages on specific topics, supporting distributed deployment and horizontal scaling.

[0014] Compared with the prior art, the beneficial effects achieved by the present invention are: First, this invention acquires multi-source data through a collaborative perception module and uses an online recognition module integrating the Monte Carlo Dropout method to quantify uncertainty and determine dynamic thresholds. It can proactively identify its own cognitive boundaries, accurately screen out difficult samples, and trigger a human-machine collaborative annotation process. The correction feedback from human experts is immediately absorbed by the system as high-quality training data, thus forming an automated closed loop of perception-recognition-evaluation-feedback-learning. This enables the system to proactively adapt to environmental changes and new goals, achieving continuous performance improvement and lifelong learning in real, complex, and ever-changing application scenarios, solving the problem of traditional static models failing due to scene changes.

[0015] Secondly, the incremental learning module of this invention adopts a hybrid method of knowledge distillation and elastic weight consolidation. While learning new knowledge, it retains the memory of old knowledge through knowledge distillation and uses the elastic weight consolidation algorithm to protect important network parameters, effectively alleviating the problem of catastrophic forgetting in incremental learning. At the same time, the main recognition model adopts high-performance backbone networks such as ResNet-50 or EfficientNet-B3 combined with a feature pyramid structure, ensuring multi-scale feature fusion and high-precision recognition. This allows the system to remain stable during continuous learning and quickly integrate new knowledge into the model, maximizing the improvement of recognition performance with minimal computation and annotation costs.

[0016] Third, this invention, through a hierarchical dynamic scene database and knowledge graph module, stores and associates raw data, labeled data, and operation logs in layers, achieving not only efficient data management but also constructing a semantic network that supports contextual reasoning. Furthermore, the entire system is modularly integrated based on an asynchronous communication architecture using message queues, with each module collaborating loosely as an independent service, supporting distributed deployment and horizontal scaling. This allows for easy handling of massive amounts of data and complex business flows, giving the system strong engineering implementation capabilities. Moreover, through complete log recording and knowledge association, it provides traceable and interpretable evidence for every identification decision, enhancing the system's reliability and maintainability. Attached Figure Description

[0017] Figure 1 This is a system architecture block diagram of the present invention; Figure 2 This is a system workflow diagram of the present invention. Detailed Implementation

[0018] The technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.

[0019] This invention provides the following technical solutions: Example

[0020] Please see Figure 2 An AI-based learning system includes: a collaborative perception and data acquisition module for connecting and managing multiple visual sensors to acquire raw environmental perception data; The incremental learning and model management module is used to host and update the core recognition model; The online identification and uncertainty assessment module is used to identify real-time or stored data and assess the reliability of the identification results. The human-computer collaborative annotation and feedback correction module is used to provide a human-computer interaction interface and receive correction information for difficult samples; The dynamic scene database and knowledge graph module is used to store data throughout the entire system operation process and to build semantic relationships between objects; The system bus and data management module is responsible for the data flow and scheduling between all modules.

[0021] The collaborative perception and data acquisition module specifically includes: a multi-source sensor interface unit that supports access to RGB cameras, depth cameras, and infrared thermal imagers; an active perception control unit; and a data preprocessing unit. The active perception control unit adjusts the gimbal pose or camera focal length according to the confidence level output by the online recognition and uncertainty assessment module, using a preset strategy to obtain multi-view or clearer images of the target. The data preprocessing unit performs automatic white balance, noise reduction, and standardized size cropping on the acquired raw images, and binds the processed image data with timestamps, sensor IDs, and acquisition parameter metadata, storing it in the dynamic scene database and knowledge graph module.

[0022] During inference, the online identification and uncertainty assessment module uses the Monte Carlo Dropout method to estimate uncertainty in addition to the output category probability distribution. In the testing phase, the same input sample is forward-propagated multiple times, and some neurons are randomly dropped each time, resulting in multiple softmax probability distributions. The average entropy or probability variance of these distributions is calculated as a measure of cognitive uncertainty. At the same time, the maximum softmax probability output by the model is used as a reference for random uncertainty. When the overall uncertainty score exceeds the adaptive threshold or the highest category probability is lower than the confidence threshold, the sample is marked as a difficult sample.

[0023] Monte Carlo Dropout is a Bayesian approximation method for estimating the uncertainty of predictions in deep neural networks. It mainly obtains a series of probability distribution outputs under random perturbations by maintaining the random activation state of the Dropout layer during the model testing phase and performing multiple forward propagation samplings on the same input sample. These distribution differences reflect the uncertainty caused by cognitive limitations such as insufficient training data and model complexity constraints. In the online identification and uncertainty assessment module of this invention, the Monte Carlo Dropout method achieves its core function through the following integrated process: First, the system performs 10-100 random Dropout forward propagations on each input sample to generate a set of Softmax probability distributions to complete uncertainty quantification. Then, the cognitive uncertainty of the model is quantified by calculating the average entropy or variance between these distributions. High variance or high entropy values ​​directly reflect the model's insufficient grasp of the feature pattern of the sample. When the calculated uncertainty score exceeds the preset adaptive threshold, the system will automatically trigger the active learning mechanism, mark the corresponding sample as a difficult sample, and transfer it to the manual verification process. Finally, these manually verified samples and their labeling results will be fed back to the incremental learning module as high-quality training data, thereby specifically strengthening the model's learning ability in the identified weak links and realizing the continuous evolution of the system.

[0024] The human-computer collaborative annotation and feedback correction module provides two interactive interfaces: Web and augmented reality. It integrates an active learning strategy and records all user correction operations. It uses these operations and their corresponding samples as high-quality positive and negative sample pairs, assigns them the highest priority weight, and sends them to the incremental learning and model management module for the next round of fine-tuning training.

[0025] The active learning strategy mainly includes the following core methods: First, uncertainty sampling is adopted, prioritizing the selection of high-uncertainty samples from the online identification module with confidence levels below a threshold or those evaluated by the Monte Carlo Dropout method. Second, diversity sampling is used, and cluster analysis is employed to ensure that the selected samples are widely distributed in the feature space, avoiding the duplication of similar samples. At the same time, multiple historical models stored in the model version repository are used for prediction, and the samples with the greatest prediction discrepancies among these models are selected. These strategies, through weighted combination, systematically filter out the most informative and difficult samples from massive amounts of unlabeled data, and push them to the Web or interactive interface to request manual annotation, thereby maximizing the improvement of model performance with minimal manual annotation costs.

[0026] The adaptive threshold is set using a dynamic statistical method. In the initial stage, the system uses a preset fixed threshold. After running for a period of time, the distribution of confidence scores of all recognition results is periodically statistically analyzed. The 5th percentile or mean of the confidence score distribution minus twice the standard deviation is set as the new confidence threshold, and the 95th percentile of the uncertainty score distribution is set as the new uncertainty threshold. This allows the threshold to be adaptively adjusted as the system performance changes.

[0027] Through the above technical solution, this embodiment constructs the core interactive closed loop of the system. Starting from perception and acquisition, it identifies difficult samples through an uncertainty evaluation mechanism with dynamic thresholds, then initiates human-machine collaborative correction, and uses the correction results for model update, thus fully realizing the dynamic adaptive process of perception-recognition-evaluation-feedback-learning. Example

[0028] Please see Figure 1 An AI-based learning system, the core of which is the incremental learning and model management module is a master recognition model based on the fusion of convolutional neural networks and attention mechanisms. The incremental learning algorithm adopts a hybrid method that combines knowledge distillation and elastic weight consolidation. The incremental learning and model management module also includes a model version repository for saving and rolling back historical models.

[0029] The specific process of the hybrid method of knowledge distillation and elastic weight consolidation is as follows: When the system needs to learn new task data, the currently trained model is first fixed as the teacher network, and a student network with the same structure is initialized at the same time. During the training process, the student network not only needs to learn to calculate the standard classification loss, but also needs to match the output probability distribution of the teacher network for representative samples of the old task through knowledge distillation technology, so as to retain the memory of the old knowledge. At the same time, the system will impose penalty terms on the network parameters with high importance according to the importance of the parameters during the training of the old task, so as to limit the drastic changes when learning the new task. Finally, by weighting and combining the classification loss, distillation loss and elastic weight penalty loss for joint optimization, the student network can effectively master new knowledge while minimizing the forgetting of the old knowledge it has learned, thereby achieving stable and continuous incremental learning.

[0030] The main recognition model uses ResNet-50 or EfficientNet-B3 as the feature extraction backbone network, followed by a feature pyramid network structure to achieve multi-scale feature fusion. The model output head is dynamically expanded according to the task. For classification tasks, a fully connected layer is used, and for detection tasks, a region proposal network and a bounding box regressor are used.

[0031] Through the above technical solutions, this embodiment specifically defines the core algorithm for incremental learning and the specific network architecture selection for the main recognition model, providing a technical solution at the model level. Example

[0032] Please see Figure 1 An AI-based learning system employs a hierarchical storage structure for its dynamic scene database and knowledge graph modules, including: a raw data layer, a labeled data layer, and a log event layer. It stores and queries data through a graph database, providing contextual reasoning support for the recognition results.

[0033] The system comprises three layers: a raw data layer and a log event layer. The raw data layer stores unprocessed data streams directly collected from sensors, such as RGB images, depth maps, infrared thermal images, and metadata like acquisition time and device serial number, forming the system's most basic perception record. The labeled data layer stores structured data after manual or automatic labeling, including target category labels, bounding box coordinates, segmentation masks, and complete labeling information such as labeler, labeling time, and confidence level, forming a teaching material that can be directly used for model training and supervision. The log event layer records all key events and interactions throughout the system's operation in a time-series format, including input parameters for each recognition request, model inference results, uncertainty assessment scores, user correction operation records, and system-triggered action commands, forming a complete and traceable system memory. These three layers correspond to raw perception, processed knowledge, and behavioral history, respectively, and are collectively stored and semantically queried using graph database technology. This allows the system to not only store data but also understand the spatiotemporal and logical relationships between data, providing rich contextual reasoning for recognition decisions.

[0034] The system bus and data management module are implemented as an asynchronous communication architecture based on message queues, using a publish-subscribe pattern. Each module acts as an independent service, triggering corresponding operations by subscribing to messages on specific topics, supporting distributed deployment and horizontal scaling.

[0035] The above technical solutions define the structured storage and knowledge organization of data, as well as the underlying communication infrastructure that supports the scalable operation of the entire system, ensuring the system's ability to process massive amounts of data and complex business flows.

[0036] In practice, the system first collects multi-source visual data through the collaborative perception module and preprocesses and stores the raw data layer. Then, the online recognition module uses the main recognition model to make inferences and applies the Monte Carlo Dropout method to quantify cognitive uncertainty. When the uncertainty exceeds the dynamic threshold, the sample is marked as a difficult sample and an active learning strategy is triggered. Manual annotation is requested through the human-computer collaborative interface. The corrected high-quality data and operation logs are stored in the knowledge graph module. Next, the incremental learning module adopts a hybrid method of knowledge distillation and elastic weight consolidation. Under the guidance of the teacher network and combined with the old knowledge penalty mechanism, the student network is fine-tuned and trained using the newly labeled data. The updated model is then put into online recognition again, thus forming a closed loop of perception-recognition-evaluation-annotation-learning-optimization. In this process, all modules communicate asynchronously through a publish-subscribe architecture based on a message queue to achieve efficient data flow and continuous system evolution.

[0037] Although embodiments of the invention have been shown and described, it will be understood by those skilled in the art that various changes, modifications, substitutions and alterations can be made to these embodiments without departing from the principles and spirit, the scope of which is defined by the appended claims and their equivalents.

Claims

1. An AI-based learning system, characterized by, include: The collaborative perception and data acquisition module is used to connect and manage multiple visual sensors and collect raw environmental perception data. The incremental learning and model management module is used to host and update the core recognition model; The online identification and uncertainty assessment module is used to identify real-time or stored data and assess the reliability of the identification results. The human-computer collaborative annotation and feedback correction module is used to provide a human-computer interaction interface and receive correction information for difficult samples; The dynamic scene database and knowledge graph module is used to store data throughout the entire system operation process and to build semantic relationships between objects; The system bus and data management module is responsible for the data flow and scheduling between all modules. 2.The AI-based learning system of claim 1, wherein: The collaborative perception and data acquisition module specifically includes: a multi-source sensor interface unit that supports access to RGB cameras, depth cameras, and infrared thermal imagers; an active perception control unit; and a data preprocessing unit. The active perception control unit adjusts the gimbal pose or camera focal length according to the confidence level output by the online recognition and uncertainty assessment module, using a preset strategy, to obtain multi-view or clearer images of the target. The data preprocessing unit performs automatic white balance, noise reduction, and standardized size cropping on the acquired raw images, and binds the processed image data with timestamps, sensor IDs, and acquisition parameter metadata, storing it in the dynamic scene database and knowledge graph module. 3.The AI-based learning system of claim 1, wherein: The core of the incremental learning and model management module is a master recognition model based on the fusion of convolutional neural networks and attention mechanisms. The incremental learning algorithm adopts a hybrid method that combines knowledge distillation and elastic weight consolidation. The incremental learning and model management module also includes a model version repository for saving and rolling back historical models. 4.The AI-based learning system of claim 1, wherein: During inference, the online identification and uncertainty assessment module, in addition to outputting the category probability distribution, uses the Monte Carlo Dropout method for uncertainty assessment: during the testing phase, the same input sample is forward-propagated multiple times, with some neurons randomly discarded each time, resulting in multiple softmax probability distributions; the average entropy or probability variance of these distributions is calculated as a measure of cognitive uncertainty; simultaneously, the maximum softmax probability output by the model is used as a reference for accidental uncertainty; when the overall uncertainty score exceeds the adaptive threshold or the highest category probability is lower than the confidence threshold, the sample is marked as a difficult sample.

5. The AI-based learning system of claim 1, wherein: The human-machine collaborative annotation and feedback correction module provides two interactive interfaces: Web and augmented reality. It integrates an active learning strategy and records all user correction operations. It uses these operations and corresponding samples as high-quality positive and negative sample pairs, assigns them the highest priority weight, and sends them to the incremental learning and model management module for the next round of fine-tuning training.

6. The AI-based learning system of claim 1, wherein: The dynamic scene database and knowledge graph module adopt a hierarchical storage structure, including: raw data layer, labeled data layer and log event layer. It uses a graph database for storage and related queries, providing contextual reasoning support for recognition results.

7. The AI-based learning system of claim 3, wherein: The main recognition model uses ResNet-50 or EfficientNet-B3 as the feature extraction backbone network, followed by a feature pyramid network structure to achieve multi-scale feature fusion. The model output head is dynamically expanded according to the task. For classification tasks, a fully connected layer is used, and for detection tasks, a region proposal network and a bounding box regressor are used. 8.The AI-based learning system of claim 4, wherein: The adaptive threshold is set using a dynamic statistical method. In the initial stage, the system uses a preset fixed threshold. After running for a period of time, the distribution of confidence scores of all recognition results is periodically statistically analyzed. The 5th percentile or mean of the confidence score distribution minus twice the standard deviation is set as the new confidence threshold, and the 95th percentile of the uncertainty score distribution is set as the new uncertainty threshold, thereby realizing the adaptive adjustment of the threshold as the system performance changes. 9.The AI-based learning system according to any one of claims 1-8, wherein: The system bus and data management module are implemented as an asynchronous communication architecture based on message queues, using a publish-subscribe pattern. Each module acts as an independent service, triggering corresponding operations by subscribing to messages on specific topics, supporting distributed deployment and horizontal scaling.