Method and device for identifying process-relevant events in a shopping container
A hybrid AI method combining deep neural networks and traditional learning techniques, along with a dedicated AI chip, addresses resource constraints in shopping containers for efficient and accurate event detection, optimizing hardware usage and integrating with retail systems.
Patent Information
- Authority / Receiving Office
- WO · WO
- Patent Type
- Applications
- Current Assignee / Owner
- WANZL GMBH & CO KGAA
- Filing Date
- 2025-12-19
- Publication Date
- 2026-06-25
AI Technical Summary
Existing technologies struggle to efficiently and scalably detect process-relevant events in shopping containers using AI due to resource constraints and limitations in hardware and algorithm performance, particularly in mobile or embedded systems, and existing methods face challenges in maintaining accuracy and precision as complexity increases.
A method and device that combines deep neural networks for object recognition with traditional machine learning methods, such as decision trees and random forests, for efficient object detection and tracking, utilizing a dedicated AI chip and state machines to handle process-relevant events, and integrating barcode scanning for precise tracking and event detection.
This approach enhances resource efficiency, scalability, and accuracy in detecting complex shopping scenarios, reducing computational overhead and extending battery life while maintaining high precision and flexibility, enabling seamless integration with retail management systems.
Smart Images

Figure EP2025088638_25062026_PF_FP_ABST
Abstract
Description
Wanzl GmbH & Co. KGaA WO 02-05-25 December 19, 2024 Method and device for detecting process-relevant events in a shopping container
[0001] The present invention relates to a method and a device for detecting process-relevant events in a shopping container, in particular a shopping cart.
[0002] Smart retail is fundamentally changing the traditional retail landscape by using artificial intelligence (AI) to enhance the shopping experience and help businesses remain competitive and attractive in the face of increasing competition from online platforms. AI enables a more efficient shopping process by identifying products and people, resulting in a fast and accurate checkout. Furthermore, the analysis of large amounts of customer data allows retailers to better understand consumer preferences, make informed decisions, predict trends, optimize inventory management, and create a personalized shopping experience. At the same time, AI helps improve the security of both the supply chain and customers by detecting shoplifting and suspicious behavior early on.
[0003] One component of smart retail is intelligent shopping aids, such as smart shopping carts or baskets. These smart shopping aids accompany a person during their shopping trip, observing and monitoring the processes within the shopping container. The data and insights gained can then be made available to a retail sales system to simplify the rest of the shopping process.
[0004] Various AI-based methods are available for monitoring activities within the shopping container, particularly for object recognition and tracking, as well as for classifying and categorizing individual activities. However, since the shopping aids are mobile devices, the available resources are limited, which restricts the possible applications of AI-based methods.
[0005] EP 3 989 105 A1 addresses this issue and highlights the challenges in the field of object and process recognition in mobile shopping aids. EP 3 989 105 A1 describes a system and method for video-based object recognition on a hardware-constrained embedded device specifically configured for the autonomous recognition of shopping items in a shopping cart. The core objective is to optimize deep learning models so that they can be executed efficiently and completely offline, even on devices with limited resources, such as a smartphone or a dedicated camera module. This ensures that the entire recognition process runs locally without the need for a server connection, thus improving data security and privacy protection.
[0006] The system comprises several models, including those for object detection, tracking, re-identification, and motion direction classification, all of which run locally on the embedded device. These models are optimized to operate within the device's limited resources, for example, through data compression and the use of quantization algorithms that accelerate calculations and reduce memory requirements. This makes it possible to complete the shopping process without an internet connection while simultaneously ensuring security by transmitting only the detected objects and their properties, rather than image data.
[0007] By applying optimization methods such as data compression and quantization algorithms, it becomes possible, in principle, to use AI methods even on hardware-constrained embedded units of a shopping aid. However, this optimization has its limitations. While data compression and quantization make the execution of AI models on embedded devices possible in the first place, these techniques quickly reach their limits as the complexity of the tasks increases. Scalability is limited because the optimizations are specifically tailored to the available resources. These systems are tailored to specific tasks, and additional requirements, such as higher accuracy or more complex models, could quickly overwhelm the hardware. Furthermore, progressive compression often leads to a loss of precision, which can impair the quality of the results. Therefore, while such systems are suitable for specific, clearly defined tasks, they are difficult to extend to more complex or variable use cases without risking significant performance degradation.
[0008] Against this background, it is a task to specify a method and a device with which process-relevant events in a shopping container can be detected efficiently and effectively during the purchasing process, taking into account the requirements for resource efficiency and scalability.
[0009] According to one aspect, this task is solved by a method for detecting process-relevant events in a shopping container during a shopping process, comprising: Detecting objects using a camera mounted on the shopping container; Tracking the detected objects across an image sequence using a tracking method, whereby each object is assigned a unique tracking number; Generating a state machine for each tracked object; Applying heuristic algorithms to determine process-relevant events based on the states and / or state changes of the generated state machines for each tracked object; as well as Providing process-relevant events at a defined interface.
[0010] According to another aspect, the task is solved by a device for detecting process-relevant events in a shopping container during a shopping process with a processing unit that is set up to detect objects by means of a camera arranged on the shopping container, to track the detected objects over an image sequence using a tracking method, whereby each object is assigned a unique identification number, to generate a state machine for each tracked object, to apply heuristic algorithms to determine process-relevant events based on states and / or state changes of the generated state machines for each tracked object, and to provide the process-relevant events at a defined interface.
[0011] The aforementioned method and the associated device therefore utilize a combination of deep neural networks for object recognition and traditional machine learning methods such as decision trees and random forests for the detection of process-relevant events (heuristic algorithms). This method offers significant advantages over systems based solely on deep neural networks. By employing traditional machine learning methods, which require considerably less computing power, the overall effort can be significantly reduced. This leads to better resource utilization, which is particularly important for embedded systems with limited hardware capacity, such as those commonly found in mobile or edge devices. This increase in efficiency makes it possible to reliably perform complex recognition tasks even on such devices without unduly impacting their performance or battery life.
[0012] Furthermore, combining these methods offers greater flexibility in scaling the solution. While deep neural networks are optimized for precise object recognition, the more traditional methods allow for a fast and resource-efficient analysis of the recognized objects and their states. This makes it possible to use the system not only for specific use cases but also for more variable scenarios without requiring a significant increase in hardware requirements. The use of heuristic algorithms for process event detection further contributes to the system's robustness by enabling The detection and classification of state changes is made more efficient, while optimally utilizing computing resources. This combination results in a powerful, scalable solution suitable for both simple and complex use cases in smart retail.
[0013] In a further embodiment, the method also includes: detecting barcodes on the tracked objects; executing a barcode scanning process in which each scanned barcode is assigned to at least one tracked object; and triggering a state transition of each state machine of a tracked object to which a barcode has been assigned.
[0014] This further development of the process integrates the detection of barcodes on the tracked objects, representing an additional step in the process flow. Various examples of barcodes include the classic EAN-13 code, frequently found on consumer goods, the QR code, often used for rapid digital links, and the Code 128, widely used in logistics and shipping.
[0015] A key advantage of this enhancement is the increased accuracy and efficiency in identifying and tracking objects during the purchasing process. By uniquely assigning a barcode to a tracked object, the system ensures that the object's status is updated precisely and reliably. This leads to improved control and traceability of individual objects within the purchasing container, minimizing errors such as misassigning items or overlooking changes. Furthermore, this design enables automated state transitions to be triggered in the state machine, further automating and accelerating the process. The widespread use of barcodes facilitates the integration of this method into existing purchasing environments, as no additional identification features or technologies are required.This makes the system not only cost-efficient but also quick to implement, thereby increasing the efficiency and accuracy of the purchasing process.
[0016] Preferably, the process-relevant events can only be provided for the tracked objects to which a barcode has been assigned.
[0017] In this preferred configuration, process-relevant events are specifically provided only for those tracked objects to which a barcode has been assigned. This offers the advantage that the system focuses on the objects that are actually important and that have been uniquely identified by the barcode. This avoids using resources to track and process objects that lack a unique identifier or are irrelevant to the purchasing process. Focusing on objects with assigned barcodes can further increase the system's efficiency. It reduces the amount of data and the complexity of the information to be processed, which in turn shortens the system's response time and increases the accuracy of the provided events.At the same time, the probability of misclassifications or incorrect state transitions is minimized, as only objects that can be uniquely identified and tracked by their barcode are considered.
[0018] In a further embodiment, the process-relevant events can include at least adding, removing or covering objects as an event.
[0019] This has the advantage that not only basic actions such as adding or removing items from the shopping container are recorded, but also more complex scenarios such as the intentional or unintentional covering of objects can be monitored and mapped.
[0020] In a further development, object detection can include neural object recognition.
[0021] Integrating neural object recognition into the process for detecting process-relevant events offers several advantages. Using deep neural networks for object recognition significantly increases the precision and reliability of the detection. Neural networks are capable of recognizing even complex patterns. and to recognize features in the image data that are difficult to access for conventional algorithms. This leads to higher accuracy in identifying objects in the shopping container, regardless of their shape, size, or orientation.
[0022] Preferably, deep neural networks are used exclusively for object recognition and, if necessary, for object tracking, while efficient heuristic methods are used for all other process steps. This combination makes it possible to selectively deploy the high computing power required by neural networks where it is most beneficial: the accurate identification and tracking of objects. At the same time, the overall computing power requirement is significantly reduced, since the heuristic methods used for other tasks, such as event detection and state processing, are considerably more resource-efficient. This has the advantage that the system can be operated efficiently on devices with limited hardware resources.The available computing power can thus be optimally utilized by allocating the majority of its capacity to demanding object recognition without compromising the overall system performance. This achieves high precision in object detection and tracking while maintaining system efficiency, which is particularly important in mobile or embedded systems.
[0023] In a further embodiment, object detection can include classifying the objects into at least three classes, with a first class being objects corresponding to human hands, a second class being objects corresponding to shopping items, and a third class being objects corresponding to barcodes.
[0024] Classifying the detected objects into at least three categories—human hands, shopping items, and barcodes—offers several advantages. This classification allows the system to specifically capture and analyze actions and interactions during the shopping process. By separately recognizing objects resembling human hands, the system can precisely determine when and how an item is manipulated, such as when adding or removing it from the shopping container.
[0025] Distinguishing between items and barcodes allows for a more efficient purchasing process by focusing only on relevant objects and their identifying characteristics. This differentiation helps minimize misclassifications and increase the accuracy of recognition processes. The clear separation of classes ensures that the system can respond flexibly to various scenarios, such as when a barcode is detected that should be linked to an item, or when a hand movement is interpreted as interaction with a specific object. This classification thus improves not only the precision but also the efficiency of the entire system by enabling targeted processing and analysis of the relevant objects.
[0026] In a further development, the tracking method can be a SORT method.
[0027] The tracking method can be implemented as a SORT (Simple Online and Realtime Tracking) method, which offers further advantages. The SORT method is characterized by its efficiency and real-time capability, as it was specifically developed for tracking objects in real time. It combines the advantages of fast motion data processing with the accuracy of modern tracking algorithms. By using SORT, objects in the shopping container can be tracked continuously and precisely, even if the scene changes dynamically or multiple objects need to be monitored simultaneously.
[0028] Another advantage of the SORT algorithm is its scalability. It is lightweight and can therefore be used efficiently even on devices with limited computing resources. This makes it particularly suitable for use in embedded systems, such as mobile shopping aids. Using SORT ensures that the system reacts quickly to changes in the shopping cart while minimizing the computational load, resulting in overall improved system performance and user-friendliness.
[0029] In particular, the SORT method is a deepSORT method, a strongSORT method, an OC-SORT and / or a HybridSORT method.
[0030] Extending the tracking method to variants such as deepSORT, strongSORT, OC-SORT, or HybridSORT offers additional advantages for tracking objects in shopping containers. These advanced methods combine the real-time capability and efficiency of the original SORT method with improved accuracy and robustness in object tracking. DeepSORT extends the basic principle by integrating deeper neural networks, enabling more accurate re-identification of objects even if they temporarily disappear from view or overlap. This significantly increases the reliability of tracking in complex scenarios.
[0031] StrongSORT and OC-SORT offer even more advanced algorithms, characterized by their robust handling of challenges such as rapid scene changes or sudden object movements. HybridSORT, in turn, combines different approaches to leverage the strengths of each method and provide a particularly flexible and adaptable tracking solution. These variants ensure that the system not only functions in a wide variety of use cases but also achieves consistently high accuracy and efficiency across different types of shopping environments and user behavior. The advanced methods enable the system to quickly adapt to changes in the scene while simultaneously optimizing resource utilization, which is particularly advantageous in embedded or mobile applications.
[0032] In a further embodiment, each state machine can distinguish between at least 7 states.
[0033] The design, in which each state machine can distinguish between at least seven states, enables differentiated and precise monitoring of the processes within the shopping container. This multitude of states allows the system to capture and analyze complex interactions and changes in the detected objects in detail. This enables not only basic actions such as... Adding or removing items can be precisely detected, but also more subtle changes, such as moving an object within the container or overlapping items.
[0034] This leads to greater accuracy in the detection of process-relevant events, as the system is able to clearly distinguish between different states and their transitions and process them accordingly. The ability to differentiate between multiple states also helps the system respond better to different scenarios and user behavior. This increases the system's flexibility and adaptability, which is particularly advantageous in variable purchasing environments, as it offers the possibility of monitoring various user actions and object manipulations in detail and making precise decisions based on this data.
[0035] In particular, the state machine can assume a first state when a tracked object is first detected, a second state when detection of the object is lost, a third state when a barcode is detected on the object, a fourth state when the object is added to the shopping container, a fifth state when the object has already been added to the shopping container, a sixth state when the object is removed from the shopping container, and a seventh state when the object is detected again.
[0036] The design, in which the state machine can distinguish between different specific states, enables detailed tracking and analysis of interactions with the objects in the shopping container. Each of these states represents a clearly defined moment in an object's lifecycle during the shopping process. For example, the first state might indicate that an object has been detected for the first time, while the second state signals that the object's detection was temporarily lost. This detailed differentiation allows the system to react precisely to changes and monitor the status of each object in real time.
[0037] A key advantage of this granular state tracking is the system's ability to seamlessly document the entire process, from object detection to final processing. This not only ensures increased accuracy but also improved fault tolerance, as the system can correctly re-identify objects and update their status even after temporary detection losses. Furthermore, the ability to detect specific states, such as adding or removing an object from the shopping container, enables precise control and monitoring of purchasing processes, which in turn enhances the user experience and increases system efficiency.
[0038] In a further embodiment, the process steps can be executed on a common hardware platform.
[0039] The design, in which all process steps are executed on a common hardware platform, offers further advantages in terms of efficiency and integration. By using a single platform, the various processes can seamlessly interact, which accelerates and simplifies communication between the individual modules. This not only reduces latency but also enables more coherent and coordinated data processing, as all relevant information can be processed centrally.
[0040] Furthermore, it is advantageous to use a common hardware platform rather than relying on an end-user device such as a mobile phone or similar device and its capabilities. While end-user devices offer some flexibility, they are often not optimally designed for the specific requirements of process-relevant event detection in a shopping container. These devices may lack the necessary computing power, sensor integration, or energy efficiency to reliably and continuously handle the tasks.
[0041] In contrast, a dedicated, shared hardware platform allows for targeted optimization for the specific requirements of the system. It can be configured so that all necessary sensors, processors, and algorithms operate efficiently. Collaboration without being limited by the constraints of an end-user device. This leads to overall higher system stability and performance, as the hardware is specifically tailored to the application's needs and is not dependent on the variable and often unpredictable resources of an end-user device. This increases reliability and ensures that detection processes run smoothly and effectively even under demanding conditions.
[0042] In particular, the hardware platform can include at least one dedicated AI chip.
[0043] The design, in which the hardware platform utilizes a dedicated AI chip for processing, offers advantages in terms of performance and efficiency. The use of a specially developed AI chip provides significantly higher computing power, enabling the real-time execution of complex deep learning algorithms. This results in faster and more precise processing of image and sensor data, which is particularly crucial in dynamic retail environments.
[0044] Another advantage is improved energy efficiency. Because the AI chip is specifically optimized for running neural networks, it requires less power to perform demanding calculations. This is particularly important for mobile and embedded systems, as lower energy consumption directly translates to longer operating times without compromising performance.
[0045] Additionally, the use of a dedicated AI chip enables better integration of the various system components. Since all relevant calculations are concentrated on a single chip, communication between processes can be made more efficient. This reduces latency and enables smooth and coordinated processing, thereby optimizing the overall system performance. These advantages contribute to making the system not only more powerful but also more robust and reliable, which is particularly important in demanding application scenarios.
[0046] In a further configuration, the defined interface can be an interface to a retail management system or an end-user facility.
[0047] The defined interface, which acts as a connection to a retail management system or end-user facility, offers advantages that contribute to improving the efficiency and flexibility of the overall system. This interface enables seamless integration of the system into existing retail infrastructures, allowing data and process-relevant events to be transmitted to central management systems in real time. This facilitates better control and management of purchasing processes, which in turn increases the efficiency of inventory management and customer service.
[0048] Direct integration with a retail management system allows for the immediate integration of identified process-relevant events into existing business processes, whether for automated inventory updates, support of marketing strategies, or improved logistics. This can lead to optimized use of data gathered during the purchasing process and contributes to streamlining retail operations.
[0049] At the same time, connecting to an end-user device, such as a customer's mobile device, can enhance the individual shopping experience. Customers can receive real-time information about their purchases, which speeds up the checkout process and enables personalized offers. The interface allows for direct interaction with the customer, strengthening customer loyalty and improving the overall shopping experience.
[0050] This integration expands the system's functionality, as it no longer operates in isolation but is actively integrated into the broader retail environment. This leads to better utilization of technological capabilities and supports retailers in operating more efficiently and with a greater focus on customer needs, without requiring extensive modifications to existing systems.
[0051] It is understood that the features mentioned above and those to be explained below can be used not only in the combinations specified, but also in other combinations or on their own, without leaving the scope of the present invention.
[0052] Exemplary embodiments of the invention are shown in the drawing and are explained in more detail in the following description. Fig. 1 shows an embodiment of a device for detecting process-relevant events in a shopping container. Fig. 2 shows in a block diagram the steps of a procedure for detecting process-relevant events in a shopping container. Fig. 3 shows a schematic representation of an embodiment of a state machine that represents the different states of an object in a shopping container during the shopping process. Fig. 4 shows a possible application scenario for a device and a method according to an embodiment of the present disclosure.
[0053] Fig. 1 shows an embodiment of a device for detecting process-relevant events in a shopping container. The device as a whole is designated by the reference numeral 100.
[0054] The device includes a hardware component that can be configured as an embedded system. At the center of the device is a processing unit 102, which performs the process steps described in more detail below. The processing unit 102 can be implemented as a single microcontroller, a combination of several microcontrollers, or as an integrated system, for example, as a system-on-a-chip (SoC). In addition to a general processing unit, the processing unit 102 can, in particular, include dedicated units for processing... image data or a computer chip specifically designed for performing neural network calculations.
[0055] The device 100 is connected to at least one camera unit 106 via a first interface 104. This first interface 104 can be configured as either a wired or a wireless connection. In this embodiment, the camera unit 106 is shown as a separate unit from the device 100, but in another embodiment, it can also be integrated into the housing 108 of the device 100.
[0056] The camera unit 106 is configured to continuously record images of a shopping container and transmit them to the processing unit 102 via the first interface 104. In addition to the shopping container, the camera unit 106 can also record the immediate surroundings of the shopping container, provided this is relevant for later analysis.
[0057] The device 100 is connected via a second interface 110, preferably a wireless connection, to a retail management system or an end-user device (not shown here). This connection serves to transmit the process-relevant events detected by the processing unit 102 to the connected system for further processing. The detected process-relevant events can include, at a minimum, the adding, removing, or covering of objects, although this list is not exhaustive.
[0058] Figure 2 illustrates a block diagram depicting the steps of a procedure for detecting process-relevant events in a shopping container during a shopping transaction. The entire procedure is designated by the reference number 200.
[0059] Procedure 200 begins with the detection of objects (step 202) based on image material captured by a camera unit mounted on the shopping container. These objects are identified in a continuous image sequence. Object detection can be achieved using advanced detectors such as the YO-LOv5 or YOLOv8. These detectors are characterized by their high accuracy and efficiency, as they are capable of recognizing objects in real time, even in complex scenes or when numerous objects are present in the field of view. The YOLOv5 and YO-LOv8 utilize optimized neural networks specifically trained for object detection in various environments, ensuring precise and reliable detection.
[0060] Following object detection, a tracking algorithm (step 204) is applied, assigning a unique tracking number to each detected object. This tracking algorithm can be based on advanced methods such as deepSORT, strongSORT, OC-SORT, or Hybrid-SORT. These algorithms combine the efficiency of the original SORT algorithm with additional features that improve the accuracy and robustness of the tracking. DeepSORT enhances tracking by integrating deep neural networks to enable precise re-identification of objects, even if they temporarily disappear from view. StrongSORT and OC-SORT offer even more robust algorithms specifically optimized for handling rapid scene changes and sudden object movements. HybridSORT combines different approaches to leverage the strengths of each method and provide a highly adaptable tracking solution.
[0061] For each tracked object, a state machine is generated that stores the object's states and defines the transitions between these states (step 206). Based on the states and their transitions, heuristic algorithms are used to identify process-relevant events (step 208). These events can include actions such as adding, removing, or covering objects. The identified process-relevant events are then made available at a defined interface for further processing (step 210).
[0062] In a further embodiment, the method can additionally include the detection of barcodes on the tracked objects. A barcode scanning process is then performed, in which each detected barcode is assigned to at least one tracked object. This process leads to a state transition into the State machines of the affected objects and can be included in the determination of process-relevant events.
[0063] Fig. 3 shows a schematic diagram of a state machine 300, which represents the different states of an object in the shopping container during the shopping process. The state machine 300 begins with the state "TrackCreated" 302, in which a new object is recognized and a corresponding tracking ID is created.
[0064] The diagram then splits into two parallel paths. In the first path, the object is classified as "FirstTimeDetected" 304 when it is first detected, and can then progress to the "Scanned" state 306 once the object is scanned, for example, after a barcode on the object has been detected and read. From the "Scanned" state 306, the object can progress to the "Added" state 308 when it is added to the shopping container, or to the "Removed" state 310 when it is removed from the shopping container.
[0065] The second parallel path classifies the object as "AlreadyAdded" 312 if it has already been detected and added to the shopping container. Here too, the object can transition from the "Scanned" state 306 to the "Added" state 308 or the "Removed" state 310.
[0066] If an object is temporarily not detected in a previous state, it goes into the "DetectionLost" state 314, from where it either goes into the "Reidentified" state 316 if the object is detected again, or into the "TrackDeleted" state 318 if the object can no longer be tracked.
[0067] The transitions between states are triggered by various events, which are described in the diagram by corresponding symbols and conditions. These transitions are assigned probabilities and reliability values to ensure the reliability of the classification and the state changes. By adjusting the conditions, it is possible to parameterize the state machine.
[0068] The state machine structure, as shown in Fig. 3, enables precise and continuous tracking of the objects in the shopping container and supports the detection of process-relevant events during the shopping process. It should be noted that the state machine is not limited to the structure shown in Fig. 3. In another embodiment, the state machine can include more, fewer, or even different states to cover specific application requirements. This flexibility allows the state machine to be adapted to different scenarios and levels of complexity to ensure precise and efficient tracking of the objects in the shopping container.
[0069] The generation of process-relevant events is the final step in the described procedure and is based on the analysis of states and state changes within the state machine, as well as the application of heuristic algorithms. These events are then provided via a defined interface, such as MQTT, and made available to a retailer application. The identified events can include, among others, the initial detection of an object (FirstTimeDetected), the capture of a barcode (BarcodeDetected), the scanning of an object (Scanned), as well as the addition (Added), removal (Removed), obscuration (Occluded), and re-uncovering (Revealed) of objects (see Fig. 3).
[0070] Among other things, an algorithm for object movement detection can be used to identify these events. This involves using information such as the object's bounding box (including position, area, and center point), the speed of the track (including direction and magnitude), the detection class, and the track ID. This basic information is processed to determine whether a product is inside or outside the shopping cart, in a transition zone between leaving and entering the cart, or whether it is being moved into or out of the cart.
[0071] To detect whether a product is being moved into a shopping cart, the object's direction of movement can be analyzed. This must fall within a defined tolerance range for the respective sector to correctly identify the event. The same applies to detecting movement out of the shopping cart. In the To detect whether an unknown product is being scanned, the system can first check if the product's barcode has already been captured. If a barcode is recognized and successfully read, the product can be registered as known, and the corresponding EAN code can be stored if necessary.
[0072] Other relevant events include, for example, the detection of whether a product is being held by hand, which is determined by a classifier (e.g., decision tree, random forest, or neural network). This can take into account the relative overlap of the bounding boxes of the hand and the product, the speed, and the change in the distance between their centers. Similarly, it can be checked whether a product is in motion by comparing the object's speed to a defined threshold. If a product is lost, this can be detected when the object detector no longer recognizes the product in a current frame. If the number of frames in which the product could not be tracked exceeds a defined threshold, the product's track is discarded.
[0073] The actual generation of process-relevant events can ultimately be achieved through a series of specific checks. For example, camera occlusion is detected if no detections are present in an image of the sequence, even though objects were detected in the previous image. This situation can be further supported by a darkening average of the image. During such occlusion, the old tracks are saved, and if new objects without a previous track ID are detected after the occlusion, they are added to the system.
[0074] Another event that can be detected is adding an item without prior scanning. This is identified when the sequence of events "FirstTime-Detected" followed by "Added" occurs, without a "Scanned" event in between. Similarly, removing an item without rescanning can be identified when the events "Added," "Removed," and "DetectionLost" occur sequentially, again without the "Scanned" event. Technically, a state can be stored for each product, which includes the scan status. If a product was already scanned when added, the barcode is stored in the product's state, thus eliminating the need for rescanning when the product is removed.
[0075] It goes without saying that the generation of process-relevant events is not limited to the above example and may include further steps or a different sequence.
[0076] Finally, Fig. 4 shows a possible application scenario for a device and a method according to an embodiment of the present disclosure.
[0077] Fig. 4 illustrates a shopping aid, shown using a shopping cart 400. The shopping cart 400 can be a standard shopping cart that is pushed by a customer through a supermarket and has a shopping container 402 into which the customer places his purchases.
[0078] An electronic unit 406, representing an embodiment of the device according to the present disclosure, is additionally attached to a handle 404 of the shopping cart 400. The electronic unit 406 can be designed as a battery-powered, embedded system comprising the processing unit described above and the necessary interfaces.
[0079] In the example shown, the electronic unit 406 is additionally equipped with a camera unit 408, which is directed at the shopping container 402 and provides the integrated processing unit with the necessary image data of the shopping container 402. Of course, the camera unit 408 can also be mounted at a different location on the shopping cart 400, with the image data being transmitted to the electronic unit 406 via a wireless or wired connection.
[0080] The electronic unit 406 executes the procedure described above and transmits the process-relevant events determined by the processing unit, for example, to a retail management system or to an end-user device for further processing. The end-user device could, for example, be a customer's smartphone.
[0081] Furthermore, the electronic unit 406 can be configured to process the process-relevant data itself and provide the customer with purchase-relevant information directly. For this purpose, the electronic unit 406 can, for example, be additionally equipped with a display 410 on which the information is shown. The display 410 can also be used to show data provided by a retail management system in response to previously transmitted process-relevant events, such as a list of products currently in the shopping cart, as well as other information provided by the retail management system, such as prices or offers for the respective products.
[0082] The application described above is merely an example of how the device and method can be used to detect process-relevant events in a shopping container. Further variations and extensions are easily conceivable to broaden the scope of application and integrate additional functions. For instance, the electronic unit 406 could be equipped with additional sensors to record further parameters of the shopping process, such as the weight of the goods in the container or the temperature of sensitive products. Such extensions could make the shopping process even more efficient and support customers with detailed information and real-time recommendations.
[0083] Another potential application could be the integration of the system into automated point-of-sale (POS) systems. The captured data could then be transmitted directly to an automated cash register, which would process the payment in real time based on the recognized and processed items. This would significantly speed up the checkout process and eliminate the need for traditional POS systems.
[0084] Additionally, the device could operate even more efficiently by being networked with other systems in the store, such as shelf sensors or warehouse management systems. This integration could enable optimized inventory management and automated control of product replenishment as soon as a certain stock level is reached. This could lead to a reduction in stock shortages and a This leads to better product availability, which benefits both the customer and the retailer.
[0085] Furthermore, the electronic unit could also be able to alert customers to special offers or discounts based on the products they place in their shopping cart. By analyzing shopping habits, the system could also provide personalized recommendations that encourage customers to make further purchases.
[0086] Overall, the device offers a wide range of applications that extend beyond simply recording process-relevant events. Its flexible architecture and expandability allow the system to be easily adapted to diverse needs and requirements in the retail sector.
[0087] In summary, this disclosure describes a method and a device for the efficient and precise detection of process-relevant events in a shopping container, particularly in a shopping cart. By employing advanced technologies, such as the combination of deep neural networks and traditional machine learning methods, a resource-efficient and scalable solution is provided. Object detection and tracking are achieved through a complex interplay of object detection, based on detectors such as YOLOv5 or YOLOv8, and advanced tracking techniques, such as deepSORT, strongSORT, OC-SORT, or HybridSORT. A state machine monitors the individual objects throughout the entire shopping process and enables the detailed analysis and classification of state changes, which ultimately lead to the generation of process-relevant events.These events are forwarded via a defined interface to a retail management system or end-user device and can be used to optimize the shopping process and provide real-time information to the customer. The integration of a dedicated AI chip on a common hardware platform enables highly efficient execution of the entire process, while the system's flexible and adaptable structure allows for easy adjustment to various scenarios and application contexts.
[0088] It should be noted that the foregoing embodiments are only exemplary and further variations of individual components are possible to realize embodiments of the following claims. The scope of protection of the present invention is determined by the following claims and is not limited by the features explained in the description or illustrated in the figures.
[0089] Furthermore, the patent application includes real-time recording of products in a physical shopping container using camera-based, AI-supported object recognition, object tracking, re-identification, and subsequent assignment of states to products in order to create a digital shopping container.
[0090] Therefore, the following points are added: - fine-granular state machine - optional: AI-supported object tracking using AI-supported visual features and motion models - Optional: Re-identification of lost objects using AI-supported visual features - Optional: Detection of personal items and bags / baskets / etc., as object classes alongside the Product and Barcode classes. Division of processing into eye and brain (conceptual) - Eye: Real-time critical components such as object detection, object tracking, and re-identification - Brain: Non-real-time critical components such as the state machine of objects and the recognition and processing of process-relevant events
[0091] Hand recognition is preferably removed in this process.
[0092] Higher accuracy can be achieved through AI-supported object tracking and re-identification. The revised state machine still allows for the removal of hands in object recognition. Therefore, it has less relevance under GDPR.
[0093] According to one aspect, this task is solved by a method for detecting process-relevant events in a shopping container during a shopping process, comprising: • Detecting and classifying objects using a camera mounted on the shopping container with the help of a quantized AI model; • Tracking the detected objects across an image sequence using a tracking method, where each object is assigned a unique tracking number, supported by motion models and visual features derived from a quantized AI model; • Re-identification of detected objects that cannot be assigned using visual features derived from a quantized AI model and comparison with a database of known lost objects (retrieval); • Generating a state machine for each tracked object; • Applying heuristic algorithms to determine process-relevant events based on the states and / or state changes of the generated state machines for each tracked object; as well as • Providing process-relevant events at a defined interface According to another aspect, the task is solved by a device for detecting process-relevant events in a shopping container during a shopping process with a processing unit that is set up to detect objects by means of a camera arranged on the shopping container, to track the detected objects over an image sequence using tracking and re-identification procedures, whereby each object is assigned a unique identification number, to generate a state machine for each tracked object, to apply heuristic algorithms to determine process-relevant events based on states and / or state changes of the generated state machines for each tracked object, and to provide the process-relevant events at a defined interface. The aforementioned method and the associated device therefore utilize a combination of deep neural networks for object detection, object tracking, and re-identification, as well as traditional machine learning methods such as Kalman filters, decision trees, random forests, or shelf-based algorithms for tracking and the detection of process-relevant events (heuristic algorithms). Furthermore, Object detection, tracking, and re-identification can be performed with (near) real-time accuracy, while subsequent detection of process-relevant events can occur at longer intervals or less frequently. By using traditional machine learning methods, which require significantly less computing power, and by dividing the system into real-time critical and non-real-time critical components, the overall overhead can be considerably reduced. This leads to better resource utilization, which is particularly important for embedded systems with limited hardware capacity, such as those commonly found in mobile or edge devices. This increase in efficiency makes it possible to reliably perform complex detection tasks even on such devices without unduly impacting their performance or battery life. Furthermore, combining these methods offers greater flexibility in scaling the solution. While deep neural networks are optimized for accurate object recognition and the creation of visual features for object tracking and re-identification, the more traditional methods allow for fast and resource-efficient analysis of the recognized objects and their states. [...] In a further embodiment, the process-relevant events can include at least adding, removing or covering objects as well as the camera as an event. This has the advantage that not only basic actions such as adding or removing items from the shopping container are recorded, but also more complex scenarios such as the intentional or unintentional covering of objects or the camera can be monitored and mapped. In a further embodiment, object detection can include classifying the objects into at least two classes, with a first class being objects are assigned to items that correspond to purchases, and to a second class of objects that correspond to barcodes. In a further embodiment, the detection of objects can include a classification into at least two further classes, with a third class being assigned to objects corresponding to personal items and a fourth class being assigned to objects corresponding to baskets or bags. In a further development, the process-relevant events can include the distinction between products, products to which a barcode has been assigned, personal items, as well as baskets or bags as an event. In a further embodiment, each state machine can distinguish between at least 11 states. The configuration in which each state machine can distinguish between at least eleven states, [...] In particular, the state machine can assume a state if ... • a tracked object is detected for the first time (308) • a barcode is detected on the object (310) • a barcode is successfully scanned / captured (312) • the object is added to the shopping container or is in the shopping container and can be re-identified (302) • the object is removed from the shopping container (304) • the object is completely removed from the field of vision after removal (306) • the object is added to the shopping container without a recognized or captured barcode, or is located in the shopping container and cannot be assigned by re-identification (318) • the object is removed from the shopping container without a recognized or recorded barcode (320) • the object is completely removed from the field of vision after removal without a recognized or captured barcode (322) Procedure 200 begins with the detection of objects (step 202) based on image data captured by a camera unit mounted on the shopping container. These objects are identified in a continuous image sequence. Object detection can be achieved using advanced detectors such as YOLO models. These detectors are characterized by their high accuracy and efficiency, as they are capable of recognizing objects in real time, even in complex scenes or with a multitude of objects in the field of view. YOLO models use optimized neural networks specifically trained to detect objects in various environments, thus ensuring precise and reliable detection. Following object detection, a tracking algorithm (step 204) is applied, assigning a unique tracking number to each detected object. This tracking algorithm can be based on advanced methods such as deepSORT, strongSORT, OC-SORT, or HybridSORT. These algorithms combine the efficiency of the original SORT algorithm with additional features that improve the accuracy and robustness of the tracking. DeepSORT enhances tracking by integrating deep neural networks to enable precise re-identification of objects, even if they temporarily disappear from view. StrongSORT and OC-SORT offer even more robust algorithms specifically optimized for handling rapid scene changes and sudden object movements. HybridSORT combines different approaches to leverage the strengths of each method and provide a highly adaptable tracking solution.Furthermore, the tracking method can be extended both by short-term re-identification, in which visual features are used to improve the assignment of objects within short periods of time, and by long-term, database-supported re-identification for objects that have disappeared from view for longer periods of time. Fig. 3 shows a schematic diagram of a state machine 300, which represents the different states of an object in the shopping container during the shopping process. The state machine begins with the state “New” 308, in which a new object is recognized and a corresponding tracking ID is created. The diagram then splits into two paths. In the first path, the object is classified as “Scanning” (310) if a barcode within the object is detected. If the barcode has not yet been successfully scanned / captured and the object is added to the shopping container, the object switches to the second path. If this path is successfully scanned / captured, the object can transition to the “Scanned” state (312). From there, the object can transition to the “Added” state (302) when it is added to the shopping container, and subsequently to the “Moving Out” state (304) when the object is removed from the shopping container. The shopping container is removed. Finally, the object can transition to state “Removed” 306 if it is completely removed from view after removal. In the second path, the object is classified as “Added Unknown” 318 if it was added to the shopping cart without a recognized or scanned / captured barcode. From this state, the object can be removed from the shopping cart and move to state “Moving Out Unknown” 320. It can then move to state “Removed Unknown” 322 if it is completely removed from view after removal, or to “Scanning Added” 314 if a barcode is detected on the object. If a barcode is successfully scanned / captured in state “Scanning Added” 314, the object can move to state “Scanned Added” 316. Adding the object back to the shopping cart will move it to state “Added” 302, or removing it from the cart and completely from view will move it to state “Removed” 306.If no barcode is scanned / captured in state “Scanning Added” 314, the object can revert to state “Added Unknown” 318 when the object is added to the shopping container, or to state “Removed Unknown” 322 when the object is removed from the shopping container and completely removed from view. Upon reaching the state “Removed” 306 or “Removed Unknown” 322, the object's state machine is completely destroyed and no further state transitions are possible. If an object in a previous state is no longer recognized and tracked, it is paused and added to the list of inactive objects. If the system recognizes and tracks a new object, it compares it to the list of inactive objects using re-identification. If the object can be uniquely re-identified and is located in the shopping container, it is removed from the list, no new state machine is created, and it transitions to the "Added" state (302). If no object is uniquely re-identified, a new state machine is created. If the new object is outside the shopping container, it transitions to the "New" state (308); if it is inside the shopping container, it transitions to the "Added Unknown" state (318). The generation of process-relevant events takes place as the final step in the described procedure and is based on the analysis of states and State changes within the state machine and the application of heuristic algorithms are recorded. These events are then provided via a defined interface, such as MQTT, and made available to a retailer application. The identified events can include, among others, the initial detection of an object (New), the capture of a barcode (Scanning), the successful scanning of an object (Scanned), as well as the addition (Added) and removal (Removed) of objects (see Fig. 3). Other relevant events include, for example, position, velocity, and acceleration to determine an object's movement by comparing the object's velocity to a defined threshold or by classifiers (e.g., decision trees, random forests, or neural networks). If a product is lost, this can be detected when the object detector no longer recognizes the product in a current frame. If the number of frames in which the product could not be tracked exceeds a defined threshold, the object's track and state machine are added to a list of inactive objects for later re-identification. Another event that can be detected is adding an item without prior scanning. This is identified when the sequence of events "New", optionally followed by "Scanning", followed by "Added Unknown", occurs without a "Scanned" event before the item is added to the shopping cart. Similarly, removing an item without a successfully scanned / captured barcode can be detected by the "Removed Unknown" event. In summary, this disclosure describes a method and a device for the efficient and precise detection of process-relevant events in a shopping container, particularly in a shopping cart. By employing advanced technologies, such as the combination of deep neural networks and traditional machine learning methods, a resource-efficient and scalable solution is provided. Object detection and tracking are achieved through a complex interplay of object detection, based on detectors such as YOLO models, and advanced tracking methods such as deepSORT, strongSORT, and OCSORT. or HybridSORT, and short-term and long-term re-identification of objects. A state machine monitors the individual objects throughout the entire purchasing process and enables the detailed analysis and classification of state changes, which ultimately lead to the generation of process-relevant events. These events are forwarded via a defined interface to a retail management system or an end-user device and can be used to optimize the purchasing process and provide real-time information to the customer. The integration of a dedicated AI chip on a common hardware platform enables highly efficient execution of the entire process, while the system's flexible and adaptable structure allows for easy adaptation to various scenarios and application contexts.
Claims
Wanzl GmbH & Co. KGaA WO 02-05-25 December 19, 2024 Claims 1. Method (200) for detecting process-relevant events in a shopping container during a shopping process, comprising: Detecting (202) objects using a camera mounted on the shopping container; Tracking (204) the detected objects across an image sequence using a tracking method, whereby each object is assigned a unique tracking number; Generating (206) a state machine for each tracked object; Applying (208) heuristic algorithms to determine process-relevant events based on states and / or state changes of the generated state machines for each tracked object; as well as Providing (210) the process-relevant events at a defined interface.
2. The method according to claim 1, further comprising: Detecting barcodes on the tracked objects; Executing a barcode scanning process in which each scanned barcode is associated with at least one tracked object; and Triggering a state transition of each state machine of a tracked object to which a barcode has been assigned.
3. Method according to claim 2, wherein the process-relevant events are provided only for the tracked objects to which a barcode has been assigned.
4. Method according to any one of claims 1 to 3, wherein the process-relevant events include at least adding, removing or covering objects as an event.
5. Method according to any one of claims 1 to 4, wherein the detection (202) of objects includes neural object recognition.
6. Method according to any one of claims 1 to 5, wherein the detection (202) of objects includes classifying the objects into at least three classes, wherein a first class is assigned objects corresponding to human hands, a second class is assigned objects corresponding to shopping items, and a third class is assigned objects corresponding to barcodes.
7. Method according to any one of claims 1 to 6, wherein the tracking method (204) is a SORT method.
8. The method of claim 7, wherein the SORT method is a deepSORT method, a strongSORT method, an OC-SORT and / or a hybridSORT method.
9. Method according to any one of claims 1 to 8, wherein each state machine (300) distinguishes between at least 7 states.
10. Method according to any one of claims 1 to 9, wherein the state machine (300) assumes a first state when a tracked object is first detected, a second state when detection of the object is lost, a third state when a barcode on the object is detected, a fourth state when the object is added to the shopping container, a fifth state when the object has already been added to the shopping container, a sixth state when the object is removed from the shopping container, and a seventh state when the object is detected again.
11. A method according to any one of claims 1 to 10, wherein the method steps are performed on a common hardware platform.
12. Method according to any one of claims 1 to 11, wherein the hardware platform is at least partially integrated into the shopping container.
13. The method of claim 12, wherein the hardware platform comprises at least one dedicated AI chip.
14. Method according to any one of claims 1 to 13, wherein the defined interface is an interface to a retail management system or an end-user facility.
15. Device (100) for detecting process-relevant events in a shopping container during a shopping process, comprising a processing unit (102) which is configured to To detect objects using a camera mounted on the shopping container, to track the detected objects across an image sequence using a tracking method, whereby each object is assigned a unique identification number, to generate a state machine for each tracked object, to apply heuristic algorithms to determine process-relevant events based on the states and / or state changes of the generated state machines for each tracked object, and to provide the process-relevant events at a defined interface.