A restaurant kitchen waste detection and intelligent inventory prediction system
The restaurant kitchen waste detection and intelligent inventory prediction system utilizes camera data collection and video stitching technology to monitor the entry, exit, and disposal of ingredients in real time. This solves the problems of lagging and insufficient comprehensiveness in restaurant kitchen waste detection, and enables real-time waste detection and inventory management.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- CHENGDU XIAOSHIDA INTELLIGENT TECHNOLOGY CO LTD
- Filing Date
- 2026-03-05
- Publication Date
- 2026-06-19
Smart Images

Figure CN122243352A_ABST
Abstract
Description
Technical Field
[0001] This invention belongs to the field of catering technology, specifically relating to a kitchen waste detection and intelligent inventory prediction system for catering establishments. Background Technology
[0002] Food waste in restaurant kitchens refers to the phenomenon where, during the entire process of receiving, processing, cutting, and cooking ingredients, due to factors such as non-standard operating procedures, inaccurate demand forecasting, and improper techniques, edible ingredients fail to achieve their intended cooking purpose and are ultimately discarded, excessively wasted, or left idle, unable to be converted into consumer value. This directly affects the profits of restaurants, therefore it is necessary to monitor and reduce its occurrence.
[0003] Currently, the detection of food waste in restaurants typically involves using cameras to record daily videos of the establishment. Managers then use these videos to assess whether waste is occurring, but this method is time-consuming and only detects waste at a limited number of locations, lacking comprehensiveness.
[0004] In view of this, a waste detection and intelligent inventory prediction system for restaurant kitchens is designed to solve the above problems. Summary of the Invention
[0005] To address the problems mentioned in the background section, this invention provides a kitchen waste detection and intelligent inventory prediction system for catering establishments. This system integrates waste detection, inventory prediction, and inventory failure management, with each component complementing the others to improve the comprehensiveness of the system's detection and prediction capabilities.
[0006] To achieve the above objectives, the present invention provides the following technical solution: a kitchen waste detection and intelligent inventory prediction system for catering establishments, comprising: The video capture module uses cameras to capture video of the kitchen food storage area, the kitchen food out storage area, and the kitchen waste bin area; The video stitching module stitches together videos collected from the kitchen food storage area, the kitchen food out storage area, and the kitchen waste bin area through the control terminal to form a complete video and generate video tags. The food ingredient entry identification module controls the terminal to identify the target food ingredient in the video with the video tag indicating food ingredient entry in the spliced video. The inbound volume calculation module controls the terminal to calculate the inbound volume of the identified target ingredients and generate an ingredient label based on the inbound volume. The food outbound identification module controls the terminal to identify the target food in the spliced video whose video tag is food outbound, including the edible and inedible parts of the target food. The outbound volume calculation module controls the terminal to calculate the outbound volume of the food based on the identified target food, and associates the inbound and outbound volumes of the food based on the food label. The waste volume calculation module allows the control terminal to calculate the waste volume of the food based on the identified inedible parts of the target food. The discarded food identification module controls the terminal to identify target food items in videos tagged as discarded food items within spliced videos. The discard volume calculation module controls the terminal to calculate the discard volume of the food based on the identified target food. The food waste detection module uses a control terminal to detect food waste by comparing the calculated volume of food waste with the volume of food discarded. The inventory forecasting module, controlled by the terminal, increases or decreases the quantity of identified incoming and outgoing ingredients to predict inventory. The inventory management module uses the control terminal to estimate the outbound volume of ingredients based on the calculated inbound volume and the pre-set volume shrinkage rate of ingredients in the storage area. It then compares this estimate with the associated outbound volume to determine if there is a fault in the inventory area and completes fault management.
[0007] Furthermore, the video stitching steps for the kitchen food storage area, kitchen food out storage area, and kitchen waste bin area of the video stitching module include: Place the checkerboard calibration board within the common field of view of the cameras in the kitchen food storage area, the kitchen food out storage area, and the kitchen waste bin area, and collect multiple calibration images; Using OpenCV and Zhang's calibration method built into the control terminal, the intrinsic parameter matrix and distortion coefficient of each camera are calculated based on the coordinates of multiple calibration images. The camera in the kitchen food storage area was used as a reference camera, while the cameras in the kitchen food delivery area and the kitchen waste bin area were used as the cameras to be stitched together. Multiple non-collinear calibration objects are placed within the common field of view of cameras in the kitchen food storage area, the kitchen food out storage area, and the kitchen waste bin area to collect multiple calibration images. The solvePnP algorithm is used to solve the extrinsic parameter matrix of the cameras to be stitched, based on the world coordinates and pixel coordinates of multiple non-collinear calibration objects. Video frames captured by cameras in the kitchen food storage area, kitchen food out area, and kitchen waste bin area are aligned according to timestamps. Video frames are corrected by calculating the distortion coefficients of each camera. Noise reduction was performed on the video frames using 3×3 Gaussian filtering and 5×5 median filtering. Based on the resolution of the reference camera, the resolution of the video frames of the cameras to be stitched is unified with the resolution of the reference camera using a bilinear interpolation algorithm; Feature points of video frames are extracted using the ORB algorithm. First, high-contrast pixels in the video frame are detected by FAST corner detection as candidate feature points. Then, the principal direction of the candidate feature points is calculated by the gray-scale centroid method, and finally, a BRIEF descriptor with rotation invariance is generated. Using the current video frame of the reference camera as the reference frame and the current video frame of the camera to be stitched as the frame to be matched, the BRIEF descriptor is matched by the FLANN matcher, the Hamming distance threshold is set, matching pairs are filtered out, and false matches are eliminated by the RANSAC algorithm and the homography matrix is determined. The warpPerspective algorithm maps the coordinate system of the frame to be matched to the coordinate system of the reference frame based on the homography matrix, performs spatial alignment, and completes the splicing.
[0008] Furthermore, the video tag generation step of the video stitching module includes: The start and end video frames of the kitchen food storage area, the kitchen food out storage area, and the kitchen waste bin area are labeled with video frame tags, including "Food storage begins", "Food storage ends", "Food out storage begins", "Food out storage ends", "Food waste begins", and "Food waste ends".
[0009] Furthermore, the target ingredient identification step of the ingredient warehousing identification module includes: The control terminal has a built-in YOLOv8n model pre-trained using historical food images. The YOLOv8n model identifies the target food in the video frame labeled "food begins to be stored" and "food ends to be stored" in the spliced video, and outputs the target bounding box coordinates to complete the target food identification. The target ingredient identification steps of the ingredient outbound identification module are similar to those of the ingredient inbound identification module, except that the video frame labels in the spliced video are "Ingredient outbound begins" and "Ingredient outbound ends". The target ingredient identification steps of the discarded ingredient identification module are similar to those of the ingredient entry identification module, except that the video frame labels in the spliced video are "Ingredients begin to be discarded" and "Ingredients end to be discarded".
[0010] Furthermore, the calculation steps for the food storage volume of the storage volume calculation module include: Obtain the pixel dimensions of a standard reference object of known size under the camera; Calculate the conversion ratio based on the actual size and pixel size of the standard reference object; Using OpenCV, images containing only food ingredients are cropped from the video based on the coordinates of the output bounding boxes, and the food ingredient images and category labels are output. The control terminal has a built-in monocular depth estimation algorithm model pre-trained using historical food images. The monocular depth estimation algorithm model calculates the distance from each pixel in the target box coordinates of the output food image to the camera, thus generating a depth map. The average depth is obtained by averaging all depth values in the depth map generated from the food image. The average depth is then converted into the actual average height using a conversion scaling factor. Count the number of pixels within the target bounding box of the food image, and calculate the effective number of pixels in the food image based on the actual average height after conversion; The volume of each effective pixel within the target bounding box of the food ingredient image is calculated and then summed to obtain the volume of the food ingredient entering the warehouse; The steps for calculating the outbound volume of food in the outbound volume calculation module are the same as those for calculating the inbound volume of food in the inbound volume calculation module. The steps for calculating the waste volume of food in the waste volume calculation module are the same as those for calculating the inbound volume of food in the inbound volume calculation module. The steps for calculating the discarded food volume in the discarded volume calculation module are the same as those for calculating the inbound food volume in the inbound volume calculation module.
[0011] Furthermore, the food waste detection module's food waste detection steps include: If the volume of food discarded is greater than the volume of food waste, the detection indicates that there is waste. If the volume of discarded food equals the volume of food waste, then the detection determines that there is no waste.
[0012] Furthermore, the pre-setting of the volume shrinkage rate of ingredients in the ingredient storage area in the inventory management module includes: Various types of ingredients are placed in their corresponding storage areas. The initial volume of the local area of the ingredients is obtained from all directions using 3D scanning. The storage volume of the local area of the ingredients is periodically collected from all directions using 3D scanning. The shrinkage rate of the storage volume of the local area of the ingredients is calculated based on the ratio of the initial volume of the local area of the ingredients from all directions to the storage volume of the local area of the ingredients from all directions in each period. To determine the similarity of the storage volume shrinkage rate of local areas of food, if the storage volume shrinkage rate of local areas is the same across all directions, then retain one storage volume shrinkage rate; if the storage volume shrinkage rate of local areas varies across all directions, then retain all the varying storage volume shrinkage rates and associate them with the corresponding local areas. The data related to the shrinkage rate of the stored food ingredients are preset.
[0013] Furthermore, the food outbound volume estimation step of the inventory management module includes: updating the food outbound volume based on the product of the estimated storage volume shrinkage rate.
[0014] Furthermore, the steps for determining whether a fault exists in the inventory area of the inventory management module include: If the estimated outbound volume of food is less than the calculated outbound volume of food, then there is a fault in the storage area. If the estimated outbound volume of ingredients equals the calculated outbound volume of ingredients, then there is no fault in the storage area.
[0015] Compared with the prior art, the beneficial effects of the present invention are: This invention, through the synergistic effect between modules, can detect the waste situation in restaurant kitchens in real time. At the same time as detecting kitchen waste, it can also complete inventory forecasting and inventory failure detection, integrating waste, inventory forecasting and inventory failure management into one, complementing each other and improving the comprehensiveness of the system's detection and prediction. Attached Figure Description
[0016] Figure 1 This is a system structure diagram of the present invention; In the diagram: 1. Video acquisition module; 2. Video stitching module; 3. Food ingredient entry identification module; 4. Inbound volume calculation module; 5. Food ingredient exit identification module; 6. Exit volume calculation module; 7. Waste volume calculation module; 8. Discarded food ingredient identification module; 9. Discarded food ingredient volume calculation module; 10. Food waste detection module; 11. Inventory forecasting module; 12. Inventory management module. Detailed Implementation
[0017] The technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.
[0018] A system for detecting kitchen waste and predicting intelligent inventory in catering establishments includes: Video acquisition module 1 captures video of the kitchen food storage area, the kitchen food out storage area, and the kitchen waste bin area via cameras; Cameras are installed in the kitchen food storage area, the kitchen food delivery area, and the kitchen waste bin area to collect high-definition video of these areas. Video stitching module 2 stitches together the collected videos from the kitchen food storage area, the kitchen food out storage area, and the kitchen waste bin area through the control terminal to form a complete video and generate video tags; A control terminal is installed in the management area of the catering store. The control terminal is electrically connected to the cameras in the kitchen food storage area, the kitchen food out storage area, and the kitchen waste bin area, and receives the video captured by the cameras in these areas. The control terminal includes a control computer; The steps for stitching together videos of the kitchen ingredient receiving area, kitchen ingredient issuing area, and kitchen waste bin area include: Place the checkerboard calibration board within the common field of view of the cameras in the kitchen food storage area, the kitchen food out storage area, and the kitchen waste bin area, and collect multiple calibration images; Using OpenCV and Zhang's calibration method built into the control terminal, the intrinsic parameter matrix and distortion coefficient of each camera are calculated based on the coordinates of multiple calibration images. The camera in the kitchen food storage area was used as a reference camera, while the cameras in the kitchen food delivery area and the kitchen waste bin area were used as the cameras to be stitched together. Multiple non-collinear calibration objects are placed within the common field of view of cameras in the kitchen food storage area, the kitchen food out storage area, and the kitchen waste bin area to collect multiple calibration images. The solvePnP algorithm is used to solve the extrinsic parameter matrix of the cameras to be stitched, based on the world coordinates and pixel coordinates of multiple non-collinear calibration objects. Video frames captured by cameras in the kitchen food storage area, kitchen food out area, and kitchen waste bin area are aligned according to timestamps. Video frames are corrected by calculating the distortion coefficients of each camera. Noise reduction was performed on the video frames using 3×3 Gaussian filtering and 5×5 median filtering. Based on the resolution of the reference camera, the resolution of the video frames of the cameras to be stitched is unified with the resolution of the reference camera using a bilinear interpolation algorithm; Feature points of video frames are extracted using the ORB algorithm. First, high-contrast pixels in the video frame are detected by FAST corner detection as candidate feature points. Then, the principal direction of the candidate feature points is calculated by the gray-scale centroid method, and finally, a BRIEF descriptor with rotation invariance is generated. Using the current video frame of the reference camera as the reference frame and the current video frame of the camera to be stitched as the frame to be matched, the BRIEF descriptor is matched by the FLANN matcher, the Hamming distance threshold is set, matching pairs are filtered out, and false matches are eliminated by the RANSAC algorithm and the homography matrix is determined. The warpPerspective algorithm maps the coordinate system of the frame to be matched to the coordinate system of the reference frame based on the homography matrix, performs spatial alignment, and completes the splicing. The steps for generating video tags include: The start and end video frames of the kitchen food storage area, the kitchen food out storage area, and the kitchen waste bin area are labeled with video frame tags, including "Food storage begins", "Food storage ends", "Food out storage begins", "Food out storage ends", "Food waste begins", and "Food waste ends". The food ingredient entry identification module 3 controls the terminal to identify the target food ingredients in the spliced video whose video frame labels are "food ingredient entry begins" and "food ingredient entry ends". The steps for identifying target ingredients include: The control terminal has a built-in YOLOv8n model pre-trained using historical food images. The YOLOv8n model identifies the target food in the video frame labeled "food begins to be stored" and "food ends to be stored" in the spliced video, and outputs the target bounding box coordinates to complete the target food identification. The 4th warehousing volume calculation module controls the terminal to calculate the warehousing volume of the identified target ingredients and generate an ingredient label based on the warehousing volume. Steps for calculating the volume of ingredients entering the warehouse: Obtain the pixel dimensions of a standard reference object of known size under the camera; Based on the actual size and pixel size of the standard reference object, the conversion scaling factor is calculated as follows: ; Using OpenCV, images containing only food ingredients are cropped from the video based on the coordinates of the output bounding boxes, and the food ingredient images and category labels are output. The control terminal has a built-in monocular depth estimation algorithm model pre-trained using historical food images. The monocular depth estimation algorithm model calculates the distance from each pixel in the target box coordinates of the output food image to the camera, thus generating a depth map. The average depth is obtained by averaging all depth values in the depth map generated from the food image. This average depth is then converted to the actual average height using a conversion scaling factor. The expression is: ; Count the number of pixels within the target bounding box of the food image. Based on the actual average height after conversion, calculate the effective number of pixels in the food image. The expression is: ; The volume of each effective pixel within the target bounding box of the food ingredient image is calculated and then summed to obtain the volume of the food ingredient entering the warehouse; The pixel volume expression is: ; The expression for the volume of ingredients entering the warehouse is: ; The steps for generating food labels include: Ingredient labels are generated using the format of "ingredient number + ingredient volume" and then superimposed on the target bounding box of the target ingredient identified by the YOLOv8n model on the control terminal. The food outbound identification module 5 controls the terminal to identify the target food in the spliced video with video frame tags of "food outbound begins" and "food outbound ends", including the edible and inedible parts of the target food; The steps for identifying target ingredients include: The control terminal has a built-in YOLOv8n model pre-trained using historical food images. The YOLOv8n model identifies the target food in the video frame labeled "food begins to leave the warehouse" and "food ends to leave the warehouse" in the spliced video, including the edible and inedible parts of the target food, and outputs the target bounding box coordinates to complete the target food identification. The outbound volume calculation module 6 controls the terminal to calculate the outbound volume of the food based on the identified target food, and associates the inbound and outbound volumes of the food based on the food label. Steps for calculating the volume of ingredients leaving the warehouse: Obtain the pixel dimensions of a standard reference object of known size under the camera; Based on the actual size and pixel size of the standard reference object, the conversion scaling factor is calculated as follows: ; Using OpenCV, images containing only food ingredients are cropped from the video based on the coordinates of the output bounding boxes, and the food ingredient images and category labels are output. The control terminal has a built-in monocular depth estimation algorithm model pre-trained using historical food images. The monocular depth estimation algorithm model calculates the distance from each pixel in the target box coordinates of the output food image to the camera, thus generating a depth map. The average depth is obtained by averaging all depth values in the depth map generated from the food image. This average depth is then converted to the actual average height using a conversion scaling factor. The expression is: ; Count the number of pixels within the target bounding box of the food image. Based on the actual average height after conversion, calculate the effective number of pixels in the food image. The expression is: ; The volume of each effective pixel within the target bounding box of the food ingredient image is calculated and then summed to obtain the outgoing volume of the food ingredient. The pixel volume expression is: ; The expression for the volume of food ingredients leaving the warehouse is: ; Waste volume calculation module 7: The control terminal calculates the waste volume of the food based on the identified inedible parts of the target food. Steps for calculating food waste volume: Obtain the pixel dimensions of a standard reference object of known size under the camera; Based on the actual size and pixel size of the standard reference object, the conversion scaling factor is calculated as follows: ; Using OpenCV, crop out images from the video containing only inedible food items based on the coordinates of the output bounding boxes, and output the food images and category labels. The control terminal has a built-in monocular depth estimation algorithm model pre-trained using historical food images. The monocular depth estimation algorithm model calculates the distance from each pixel in the target box coordinates of the output food image to the camera, thus generating a depth map. The average depth is obtained by averaging all depth values in the depth map generated from the food image. This average depth is then converted to the actual average height using a conversion scaling factor. The expression is: ; Count the number of pixels within the target bounding box of the food image. Based on the actual average height after conversion, calculate the effective number of pixels in the food image. The expression is: ; The volume of each valid pixel within the target bounding box of the food image is calculated and then summed to obtain the volume of discarded food. The pixel volume expression is: ; The expression for the volume of food waste is: ; The discarded food identification module 8 controls the terminal to identify the target food in the spliced video whose video frame labels are "food begins to be discarded" and "food ends to be discarded"; The steps for identifying target ingredients include: The control terminal has a built-in YOLOv8n model pre-trained using historical food images. The YOLOv8n model identifies target food in videos with video frame labels "food begins to be discarded" and "food ends to be discarded" in the spliced video, and outputs the coordinates of the target bounding box to complete the target food identification. Discard volume calculation module 9: The control terminal calculates the discard volume of the food based on the identified target food. Steps for calculating the volume of food waste: Obtain the pixel dimensions of a standard reference object of known size under the camera; Based on the actual size and pixel size of the standard reference object, the conversion scaling factor is calculated as follows: ; Using OpenCV, images containing only food ingredients are cropped from the video based on the coordinates of the output bounding boxes, and the food ingredient images and category labels are output. The control terminal has a built-in monocular depth estimation algorithm model pre-trained using historical food images. The monocular depth estimation algorithm model calculates the distance from each pixel in the target box coordinates of the output food image to the camera, thus generating a depth map. The average depth is obtained by averaging all depth values in the depth map generated from the food image. This average depth is then converted to the actual average height using a conversion scaling factor. The expression is: ; Count the number of pixels within the target bounding box of the food image. Based on the actual average height after conversion, calculate the effective number of pixels in the food image. The expression is: ; The volume of each valid pixel within the target bounding box of the food image is calculated and then summed to obtain the volume of food to be discarded. The pixel volume expression is: ; The expression for the volume of food discarded is: ; The food waste detection module 10, controlled by the terminal, detects food waste by comparing the calculated volume of food waste with the volume of food discarded. The steps for determining waste include: If the volume of food discarded is greater than the volume of food waste, the detection indicates that there is waste. If the volume of food discarded is equal to the volume of food waste, then the detection determines that there is no waste. Inventory forecasting module 11: The control terminal increases or decreases the quantity of identified inbound and outbound ingredients to forecast inventory. The inventory management module 12 controls the terminal to estimate the outbound volume of the ingredients based on the calculated inbound volume and the pre-set volume shrinkage rate of the ingredients in the storage area. It then compares the outbound volume with the associated outbound volume to determine whether there is a fault in the inventory area and completes fault management. The steps for pre-setting the volume shrinkage rate of food ingredients in the food storage area include: Various types of ingredients are placed in their corresponding storage areas. The initial volume of the local area of the ingredients is obtained from all directions using 3D scanning. The storage volume of the local area of the ingredients is periodically collected from all directions using 3D scanning. The shrinkage rate of the storage volume of the local area of the ingredients is calculated based on the ratio of the initial volume of the local area of the ingredients from all directions to the storage volume of the local area of the ingredients from all directions in each period. 3D scanning refers to scanning food ingredients with a 3D scanner to generate a three-dimensional model of the food ingredients, and then using scanning software such as Geomagic Studio to calculate the volume of the three-dimensional model of the food ingredients. The expression for the volume shrinkage rate of a localized area of food during storage is: ; To determine the similarity of the storage volume shrinkage rate of local areas of food, if the storage volume shrinkage rate of local areas is the same across all directions, then retain one storage volume shrinkage rate; if the storage volume shrinkage rate of local areas varies across all directions, then retain all the varying storage volume shrinkage rates and associate them with the corresponding local areas. Pre-set the relevant data on the volume shrinkage rate of the retained food during storage; The steps for estimating the volume of food ingredients leaving the warehouse include: The outbound volume of food ingredients is estimated and updated based on the storage volume shrinkage rate, expressed as: ; The steps for determining whether there is a fault in the inventory area include: If the estimated outbound volume of food is less than the calculated outbound volume of food, then there is a fault in the storage area. If the estimated outbound volume of ingredients equals the calculated outbound volume of ingredients, then there is no fault in the storage area.
[0019] Although embodiments of the invention have been shown and described, it will be understood by those skilled in the art that various changes, modifications, substitutions and alterations can be made to these embodiments without departing from the principles and spirit of the invention, the scope of which is defined by the appended claims and their equivalents.
Claims
1. A system for detecting kitchen waste and predicting intelligent inventory in catering establishments, characterized in that, include: The video acquisition module (1) captures videos of the kitchen food storage area, the kitchen food out storage area, and the kitchen waste bin area through a camera; The video splicing module (2) splices the collected videos of the kitchen food storage area, the kitchen food out storage area and the kitchen waste bin area through the control terminal to form an overall video and generate video tags; The food ingredient entry identification module (3) controls the terminal to identify the target food ingredient in the video with the video tag "food ingredient entry" in the spliced video; The storage volume calculation module (4) controls the terminal to calculate the storage volume of the food based on the identified target food, and generates food labels based on the storage volume of the food. The food outbound identification module (5) controls the terminal to identify the target food in the video with the video tag "food outbound" in the spliced video, including the edible and inedible parts of the target food. The outbound volume calculation module (6) controls the terminal to calculate the outbound volume of the food based on the identified target food, and associates the inbound and outbound volumes of the food based on the food label. The waste volume calculation module (7) calculates the waste volume of the food based on the inedible part of the identified target food. Discarded food identification module (8) controls the terminal to identify the target food in the video with the video tag "discarded food" in the spliced video; The discard volume calculation module (9) controls the terminal to calculate the discard volume of the food based on the identified target food; The food waste detection module (10) controls the terminal to detect food waste based on the calculated waste volume and the volume of discarded food. The inventory forecasting module (11) controls the terminal to increase or decrease the quantity of the identified inbound and outbound ingredients to forecast the inventory. The inventory management module (12) controls the terminal to estimate the outbound volume of the ingredients based on the calculated inbound volume and the pre-set volume shrinkage rate of the ingredients in the storage area. It then compares the outbound volume with the associated outbound volume to determine whether there is a fault in the inventory area and completes fault management.
2. The kitchen waste detection and intelligent inventory prediction system for catering establishments according to claim 1, characterized in that: The video stitching module (2) performs the following steps for stitching videos of the kitchen food storage area, the kitchen food out storage area, and the kitchen waste bin area: Place the checkerboard calibration board within the common field of view of the cameras in the kitchen food storage area, the kitchen food out storage area, and the kitchen waste bin area, and collect multiple calibration images; Using OpenCV and Zhang's calibration method built into the control terminal, the intrinsic parameter matrix and distortion coefficient of each camera are calculated based on the coordinates of multiple calibration images. The camera in the kitchen food storage area was used as a reference camera, while the cameras in the kitchen food delivery area and the kitchen waste bin area were used as the cameras to be stitched together. Multiple non-collinear calibration objects are placed within the common field of view of cameras in the kitchen food storage area, the kitchen food out storage area, and the kitchen waste bin area to collect multiple calibration images. The solvePnP algorithm is used to solve the extrinsic parameter matrix of the cameras to be stitched, based on the world coordinates and pixel coordinates of multiple non-collinear calibration objects. Video frames captured by cameras in the kitchen food storage area, kitchen food out area, and kitchen waste bin area are aligned according to timestamps. Video frames are corrected by calculating the distortion coefficients of each camera. Noise reduction was performed on the video frames using 3×3 Gaussian filtering and 5×5 median filtering. Based on the resolution of the reference camera, the resolution of the video frames of the cameras to be stitched is unified with the resolution of the reference camera using a bilinear interpolation algorithm; Feature points of video frames are extracted using the ORB algorithm. First, high-contrast pixels in the video frame are detected by FAST corner detection as candidate feature points. Then, the principal direction of the candidate feature points is calculated by the gray-scale centroid method, and finally, a BRIEF descriptor with rotation invariance is generated. Using the current video frame of the reference camera as the reference frame and the current video frame of the camera to be stitched as the frame to be matched, the BRIEF descriptor is matched by the FLANN matcher, the Hamming distance threshold is set, matching pairs are filtered out, and false matches are eliminated by the RANSAC algorithm and the homography matrix is determined. The warpPerspective algorithm maps the coordinate system of the frame to be matched to the coordinate system of the reference frame based on the homography matrix, performs spatial alignment, and completes the splicing.
3. The catering store kitchen waste detection and intelligent inventory prediction system according to claim 2, characterized in that: The video tag generation steps of the video splicing module (2) include: The start and end video frames of the kitchen food storage area, the kitchen food out storage area, and the kitchen waste bin area are labeled with video frame tags, including "Food storage begins", "Food storage ends", "Food out storage begins", "Food out storage ends", "Food waste begins", and "Food waste ends".
4. The kitchen waste detection and intelligent inventory prediction system for catering establishments according to claim 3, characterized in that: The target ingredient identification steps of the ingredient warehousing identification module (3) include: The control terminal has a built-in YOLOv8n model pre-trained using historical food images. The YOLOv8n model identifies the target food in the video frame labeled "food begins to be stored" and "food ends to be stored" in the spliced video, and outputs the target bounding box coordinates to complete the target food identification. The target ingredient identification steps of the ingredient outbound identification module (5) are similar to those of the target ingredient identification steps of the ingredient inbound identification module (3), except that the video frame labels in the spliced video identified are "Ingredient outbound begins" and "Ingredient outbound ends". The target ingredient identification step of the discarded ingredient identification module (8) is similar to the target ingredient identification step of the ingredient storage identification module (3), except that the video frame labels in the spliced video are "Ingredients start to be discarded" and "Ingredients end to be discarded".
5. The kitchen waste detection and intelligent inventory prediction system for catering establishments according to claim 4, characterized in that: The calculation steps for the food storage volume of the storage volume calculation module (4) include: Obtain the pixel dimensions of a standard reference object of known size under the camera; Calculate the conversion ratio based on the actual size and pixel size of the standard reference object; Using OpenCV, images containing only food ingredients are cropped from the video based on the coordinates of the output bounding boxes, and the food ingredient images and category labels are output. The control terminal has a built-in monocular depth estimation algorithm model pre-trained using historical food images. The monocular depth estimation algorithm model calculates the distance from each pixel in the target box coordinates of the output food image to the camera, thus generating a depth map. The average depth is obtained by averaging all depth values in the depth map generated from the food image. The average depth is then converted into the actual average height using a conversion scaling factor. Count the number of pixels within the target bounding box of the food image, and calculate the effective number of pixels in the food image based on the actual average height after conversion; The volume of each effective pixel within the target bounding box of the food ingredient image is calculated and then summed to obtain the volume of the food ingredient entering the warehouse; The steps for calculating the outbound volume of the food ingredients in the outbound volume calculation module (6) are the same as those for calculating the inbound volume of the food ingredients in the inbound volume calculation module (4). The waste volume calculation module (7) calculates the food waste volume in the same way as the food storage volume calculation module (4). The food discarding calculation steps of the discarding volume calculation module (9) are the same as the food storage volume calculation steps of the storage volume calculation module (4).
6. The kitchen waste detection and intelligent inventory prediction system for catering establishments according to claim 5, characterized in that: The food waste detection module (10) includes the following steps for detecting food waste: If the volume of food discarded is greater than the volume of food waste, the detection indicates that there is waste. If the volume of discarded food equals the volume of food waste, then the detection determines that there is no waste.
7. The catering store kitchen waste detection and intelligent inventory prediction system according to claim 6, characterized in that: The pre-setting of the volume shrinkage rate of ingredients in the food storage area in the inventory management module (12) includes: Various types of ingredients are placed in their corresponding storage areas. The initial volume of the local area of the ingredients is obtained from all directions using 3D scanning. The storage volume of the local area of the ingredients is periodically collected from all directions using 3D scanning. The shrinkage rate of the storage volume of the local area of the ingredients is calculated based on the ratio of the initial volume of the local area of the ingredients from all directions to the storage volume of the local area of the ingredients from all directions in each period. To determine the similarity of the storage volume shrinkage rate of local areas of food, if the storage volume shrinkage rate of local areas is the same across all directions, then retain one storage volume shrinkage rate; if the storage volume shrinkage rate of local areas varies across all directions, then retain all the varying storage volume shrinkage rates and associate them with the corresponding local areas. The data related to the volume shrinkage rate of the retained food items can be preset.
8. The kitchen waste detection and intelligent inventory prediction system for catering establishments according to claim 7, characterized in that: The food outbound volume estimation step of the inventory management module (12) includes: updating the food outbound volume based on the product of the estimated storage volume shrinkage rate.
9. The kitchen waste detection and intelligent inventory prediction system for catering establishments according to claim 8, characterized in that: The steps for determining whether there is a fault in the inventory area of the inventory management module (12) include: If the estimated outbound volume of food is less than the calculated outbound volume of food, then there is a fault in the storage area. If the estimated outbound volume of ingredients equals the calculated outbound volume of ingredients, then there is no fault in the storage area.