Method and system for identifying empty / full condition of scrap slag pot
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- INNER MONGOLIA UNIV OF SCI & TECH
- Filing Date
- 2023-06-20
- Publication Date
- 2026-06-26
Smart Images

Figure CN116704418B_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of waste steel slag container identification technology, and in particular to a method and system for identifying the empty or full state of a waste steel slag container. Background Technology
[0002] Modern industrialized plants, adhering to the principles of practicality, reliability, and economy, incorporate artificial intelligence (AI) methods within existing infrastructure to replace manual labor in related tasks. The automatic slag pot spraying system utilizes the existing mixing and pressurizing system, adding detection and judgment functions such as pot emptying, vehicle speed, video, and pressure control. Employing multi-functional nozzles, it achieves automatic, rapid (1-2 seconds), quantitative, and uniform spraying, thereby improving spraying quality and reducing costs. Detection of the empty / full status of scrap steel slag pots in complex scenarios is a crucial component of this system.
[0003] Existing solutions include intelligent measurement and control technology and target detection technology. Intelligent measurement and control technology involves using cameras to photograph the tanker truck's tires and combining this with physical quantities such as tire pressure and size to calculate the tire's load-bearing capacity, ultimately estimating the tanker's empty / full status. Target detection technology involves classifying tankers according to their internal slag content (empty / full), creating a database of state image samples, and using target detection algorithms to learn the tanker's state.
[0004] The first approach is significantly affected by external factors. Considering that tire size and pressure can vary even for the same brand and model, and that some slag may adhere to the inner wall of the can, and that the slag is relatively lightweight, this method is less feasible. While the second approach addresses this issue, the varying shapes of the slag within the can make classification a challenge. Furthermore, environmental factors such as strong light, rain, and heavy snow can affect the slag's internal characteristics, making classification difficult and hindering learning. Therefore, the existing methods have low accuracy.
[0005] The information disclosed in this background section is intended only to enhance the understanding of the overall background of the invention and should not be construed as an admission or in any way implying that the information constitutes prior art known to those skilled in the art. Summary of the Invention
[0006] The purpose of this invention is to provide a method and system for identifying the empty or full state of waste steel slag containers. This method can identify the empty or full state of waste steel slag containers under various environmental conditions, with high accuracy, improved work efficiency, reduced labor costs, and enhanced safety.
[0007] To achieve the above objectives, in a first aspect, the present invention provides a method for identifying the empty and full states of a waste steel slag container, applicable to various environmental conditions. The method includes: step S100, acquiring a video of the waste steel slag container and preprocessing it to generate a target detection dataset and a state recognition dataset; step S200, inputting the target detection dataset into a target detection model for training to generate a detection model; step S300, inputting the state recognition dataset into a recognition model for training to generate a state recognition model; step S400, processing the video of the waste steel slag container to be tested to generate data to be detected; step S500, inputting the data to be detected into the detection model for detection to generate a target to be identified; step S600, inputting the target to be identified into the state recognition model to generate a recognition result; and step S700, inputting the recognition result into an interactive interface module to guide the slag spraying operation.
[0008] In one embodiment of the present invention, acquiring a video of a waste steel slag container and performing preprocessing to generate a target detection dataset and a state recognition dataset includes: step S101, acquiring a video of a waste steel slag container and performing data cleaning to obtain image data; step S102, generating a target detection dataset and a state recognition dataset based on the image data.
[0009] In one embodiment of the present invention, the target detection dataset is input into the target detection model for training, and the generation of the detection model includes: inputting the target detection dataset into the Yolov7 target detector of the target detection model for training, predicting the classification regression box of the slag pot at multiple scales, generating the detection model, and performing pruning operations to compress the detection model.
[0010] In one embodiment of the present invention, the state recognition dataset is input into the recognition model for training to generate the state recognition model, which includes: inputting the state recognition dataset into the recognition model, the recognition model reading data in batches to form positive and negative sample pairs, using a ResNet50 network to extract features and fusing 1 / 16 and 1 / 32 scale features, calculating the distance between positive and negative samples using a preset algorithm, optimizing the model using a preset method, and using amplitude iterative pruning to compress the model to generate the state recognition model.
[0011] In one embodiment of the present invention, processing the video of the waste steel slag filling to be tested to generate the data to be tested includes: processing the data of the video of the waste steel slag filling to be tested by extracting 1 frame at intervals of 4 frames, and adjusting the data to a preset size image to generate the data to be tested.
[0012] In one embodiment of the present invention, inputting the data to be detected into the detection model for detection and generating the target to be identified includes: inputting the data to be detected into the detection model, and the detection model cutting out the detected slag pot target to generate the target to be identified.
[0013] In one embodiment of the present invention, inputting the target to be identified into the state recognition model to generate a recognition result includes: inputting the target to be identified into the state recognition model, wherein the state recognition model performs feature extraction and feature fusion on the target to be identified, calculates the distance between the target to be identified and a preset target, determines the empty / full state, and generates a recognition result.
[0014] In one embodiment of the present invention, inputting the identification result into the interactive interface module to guide the slag spraying operation includes: inputting the identification result into the interactive interface module, realizing visualization through a Web application framework, thereby guiding the slag spraying operation.
[0015] Secondly, the present invention provides a system for identifying the empty / full state of a scrap steel slag container. Based on the aforementioned method for identifying the empty / full state of a scrap steel slag container, the system includes: a first generation module, a second generation module, a third generation module, a fourth generation module, a fifth generation module, a sixth generation module, and a visualization module. The first generation module is used to acquire a video of the scrap steel slag container and perform preprocessing to generate a target detection dataset and a state recognition dataset. The second generation module is used to input the target detection dataset into a target detection model for training to generate a detection model. The third generation module is used to input the state recognition dataset into a recognition model for training to generate a state recognition model. The fourth generation module is used to process the video of the scrap steel slag container to be tested to generate data to be detected. The fifth generation module is used to input the data to be detected into the detection model for detection to generate a target to be identified. The sixth generation module is used to input the target to be identified into the state recognition model to generate a recognition result. The visualization module is used to input the recognition result into an interactive interface module to guide the slag spraying operation.
[0016] In one embodiment of the present invention, the step of acquiring the video of the waste steel slag container and performing preprocessing to generate a target detection dataset and a state recognition dataset includes: acquiring the video of the waste steel slag container and performing data cleaning to obtain image data; and generating a target detection dataset and a state recognition dataset based on the image data.
[0017] Compared with the prior art, the method and system for identifying the empty and full state of waste steel slag tanks according to the present invention can identify the empty and full state of waste steel slag tanks under various environmental conditions, with high identification accuracy, improved work efficiency, reduced labor costs, and enhanced safety. Attached Figure Description
[0018] Figure 1 This is a flowchart illustrating a method for identifying the empty or full state of a waste steel slag container according to Embodiment 1 of the present invention.
[0019] Figure 2 This is a flowchart illustrating the training phase of a method for identifying the empty or full state of a waste steel slag container according to Embodiment 1 of the present invention.
[0020] Figure 3 This is a flowchart illustrating the testing phase of a method for identifying the empty or full state of a waste steel slag container according to Embodiment 1 of the present invention.
[0021] Figure 4 This is a schematic diagram of the structure of a system for identifying the empty or full state of a waste steel slag tank according to Embodiment 2 of the present invention;
[0022] Figure 5 This is a schematic diagram of a video frame in Embodiment 3 of the present invention;
[0023] Figure 6 This is a schematic diagram of the camera installation position in Embodiment 3 of the present invention;
[0024] Figure 7 This is a schematic diagram of the can number logic judgment in Embodiment 3 of the present invention;
[0025] Figure 8 This is a schematic diagram of amplitude iterative pruning in Embodiment 3 of the present invention;
[0026] Figure 9 This is a schematic diagram of the identification results of the empty / full state of waste steel slag filling in Embodiment 3 of the present invention.
[0027] Explanation of key figure labels:
[0028] 401 - First generation module, 402 - Second generation module, 403 - Third generation module, 404 - Fourth generation module, 405 - Fifth generation module, 406 - Sixth generation module, 407 - Visualization module. Detailed Implementation
[0029] The specific embodiments of the present invention will now be described in detail with reference to the accompanying drawings, but it should be understood that the scope of protection of the present invention is not limited to the specific embodiments.
[0030] Unless otherwise expressly stated, throughout the specification and claims, the term "comprising" or its variations such as "including" or "comprises" shall be understood to include the stated elements or components without excluding other elements or other components.
[0031] To facilitate understanding, the main implementation concepts of the various embodiments of the present invention will be briefly described first.
[0032] Combined with data and image analysis, the accuracy of recognition is lower at night due to poor lighting. The inner walls of empty cans are severely corroded due to long-term filling with waste, exhibiting patterned textures, making it a significant challenge to identify whether the cans are empty or full. If a misjudgment of the can's empty / full status leads to missed coating, it can cause damage to the can, increasing the scrap rate, or even lead to an explosion, endangering the lives of workers.
[0033] The inventors, having discovered the technical defects described in the background section, sought to find a method and system for identifying the empty or full state of waste steel slag containers. This system would be able to identify the empty or full state of waste steel slag containers under various environmental conditions, including daytime, nighttime, rainy days, and foggy weather, and would send the final identification result to the receiver in the form of a pulse signal.
[0034] In the field of computer vision, object classification is a relatively simple and easy-to-implement method. Object detection is essentially an enhanced version of classification, requiring more precise localization of objects in an image and outputting bounding boxes and classification labels. Deep learning-based object detection algorithms are mainly divided into two categories: R-CNN series algorithms and YOLO series algorithms. Currently, the most popular object detection method is the YOLOv7-based algorithm. For each type of object to be identified, a new class is created, and after extensive learning of the features of each type of object, the classification and regression task is finally completed. Given the high detection speed of the YOLO series algorithms, this invention divides the recognition task into two stages. YOLOv7's object detection technology is used to first detect and crop the waste steel slag can target before proceeding with the subsequent recognition task, effectively reducing background interference and improving recognition accuracy.
[0035] The purpose of metric learning is to find an optimal distance metric for calculating the similarity between samples; by reducing the distance between similar samples and increasing the distance between different samples, it aims to achieve the desired result. Commonly used distance metrics include Euclidean distance, Mahalanobis distance, Manhattan distance, and cosine similarity. Facial recognition technology is a common example of a technology that utilizes metric learning-based methods.
[0036] Re-identification (ReID) technology, or person re-identification technology, utilizes distance calculations in metric learning to search for targets in an image database, belonging to a sub-problem of image retrieval. In real-world scenarios, due to the influence of monitoring angle, distance, and clarity, faces may not be captured, rendering face recognition technology unusable. Therefore, ReID technology was proposed to replace face recognition in finding targets within video sequences, and it is widely used in security systems, personal location systems, and other services. Since identifying people and recognizing the empty / full state of containers are similar tasks, and ReID technology should have a greater advantage in rigid body recognition, the re-identification technology suitable for people can also be applied to recognizing the empty / full state of scrap steel slag containers. Therefore, this invention, based on traditional ReID technology, makes certain improvements to the network architecture to achieve higher-quality recognition.
[0037] This invention also proposes a model lightweighting technique. To reduce the model's framework size, remove complex and redundant parameters, improve computational speed, and meet the energy-saving and environmental protection requirements of the new era, model lightweighting is essential. Existing model lightweighting techniques include distillation, pruning, and quantization. Distillation, within a teacher-student framework, extracts the feature representations ("knowledge") learned by a complex, high-learning-capability network and passes them to a network with fewer parameters and weaker learning capabilities. Pruning involves training the model, removing a certain proportion of weights (structured or unstructured), setting them to zero, and then retraining the sparse network structure to fine-tune the model, ultimately obtaining a sparse model with a certain parameter ratio compared to the original model. Referring to the lottery hypothesis, this invention uses amplitude iterative pruning, initializing sub-network weights with the original network's initial weights, which outperforms pruning methods that use random weight initialization.
[0038] Example 1
[0039] Figure 1 This is a flowchart illustrating the training phase of a method for identifying the empty or full state of a waste steel slag container according to Embodiment 1 of the present invention. Figure 2 This is a flowchart illustrating the testing phase of a method for identifying the empty / full state of a waste steel slag container according to an embodiment of the present invention. Figures 1 to 2 As shown in Example 1, a method for identifying the empty or full state of a waste steel slag container is provided and can be applied to various environmental conditions. The method includes:
[0040] Step S100: Acquire video of the waste steel slag tank and preprocess it to generate target detection dataset and state recognition dataset;
[0041] Step S200: Input the target detection dataset into the target detection model for training to generate a detection model;
[0042] Step S300: Input the state recognition dataset into the recognition model for training to generate the state recognition model;
[0043] Step S400: Process the video of the waste steel slag to be tested to generate the test data;
[0044] Step S500: Input the data to be detected into the detection model for detection to generate the target to be identified;
[0045] Step S600: Input the target to be identified into the state recognition model to generate a recognition result;
[0046] Step S700: Input the recognition result into the interactive interface module to guide the slag spraying operation.
[0047] Specifically, the model training phase consists of three steps: image preprocessing, object detection model training, and state recognition model training. The model deployment and testing phase consists of four steps: data preprocessing, object detection model testing, state recognition model testing, and data interaction interface.
[0048] In this embodiment, step S100 includes: step S101, acquiring video of waste steel slag container and cleaning the data to obtain image data; step S102, generating target detection dataset and state recognition dataset based on the image data.
[0049] Specifically, the video processing first involves data cleaning, extracting the video of the time period when the train passes by, and converting it into image data by extracting 1 frame out of 20 frames, and then creating object detection datasets and state recognition datasets respectively.
[0050] In this embodiment, step S200 includes: inputting the target detection dataset into the Yolov7 target detector of the target detection model for training, predicting the classification regression box of the slag pot at multiple scales, generating the detection model, and performing pruning operations to compress the detection model.
[0051] Specifically, the completed target detection dataset is fed into the Yolov7 detection model for training, which includes detecting four types of targets: slag cans, can connectors, train tails, and train locomotives. The regression boxes of the four types of objects are predicted at multiple scales, and the trained detection model is then pruned to compress the model.
[0052] In this embodiment, step S300 includes: inputting the state recognition dataset into the recognition model, the recognition model reading data in batches to form positive and negative sample pairs, using a ResNet50 network to extract features and fusing 1 / 16 and 1 / 32 scale features, calculating the distance between positive and negative samples using a preset algorithm, optimizing the model using a preset method, and using amplitude iterative pruning to compress the model to generate the state recognition model.
[0053] Specifically, the completed state recognition model dataset is fed into the recognition model for training. First, data is read in batches to form positive and negative sample pairs. Features are extracted using a ResNet50 network and fused with 1 / 16 and 1 / 32 scale features. A joint distance calculation formula is designed, and Euclidean distance and cosine distance are used to jointly calculate the distance between positive and negative samples. From the perspective of two distance measurement, the model recognition accuracy is improved. The model is optimized by using a method of joint training with center loss and triplet loss to reduce intra-class distance and increase inter-class distance. Finally, amplitude iterative pruning technique is used to compress the model, thus completing all the work in the training phase.
[0054] In this embodiment, step S400 includes: processing the data of the video of the waste steel slag to be tested by extracting 1 frame at intervals of 4 frames, and adjusting the data to a preset size image to generate the data to be tested.
[0055] In this embodiment, step S500 includes: inputting the data to be detected into the detection model, and the detection model trimming the detected slag pot target to generate the target to be identified.
[0056] Specifically, the video data is first fed into the model by sampling one frame every four frames, and the data is adjusted to a 640*640 image size before being fed into the detection model. The trained YOLOv7 model is used to detect the slag can, can connector, train tail, and train head. The detected train head provides a signal to start the recognition process, and the detected train tail provides a signal to end the process. The strategy of identifying the can connector to assist in updating the can number is used. The detected slag can targets are cropped out, adjusted to a 256*256 image size, and fed into the recognition model.
[0057] In this embodiment, step S600 includes: inputting the target to be identified into the state recognition model, the state recognition model performing feature extraction and feature fusion on the target to be identified, calculating the distance between the target to be identified and a preset target, determining the empty / full state, and generating a recognition result.
[0058] Specifically, the state recognition model performs feature extraction and feature fusion on the input image data, calculates the distance between it and the template tank, sets a threshold of 0.85. If the distance is greater than the threshold, it is judged as a non-empty tank; otherwise, it is an empty tank. Then, the recognition results of a train of tank cars are transmitted as a "heartbeat" through a thread and the data is saved.
[0059] In this embodiment, step S700 includes: inputting the recognition result into the interactive interface module, and realizing visualization through a Web application framework, thereby guiding the slag spraying operation.
[0060] Specifically, the interactive interface module receives the "heartbeat," transmits the received "heartbeat" to the receiver via Modbus, uses the Flask framework to visualize the real-time identification process, and transmits the results to the storage device and display terminal to guide the slag spraying operation.
[0061] Example 2
[0062] Example 2 provides a system for identifying the empty or full state of a waste steel slag container. Based on the above-described method for identifying the empty or full state of a waste steel slag container, the system includes: a first generation module 401, a second generation module 402, a third generation module 403, a fourth generation module 404, a fifth generation module 405, a sixth generation module 406, and a visualization module 407. The first generation module 401 is used to acquire and preprocess video of scrap steel slag containers to generate a target detection dataset and a state recognition dataset; the second generation module 402 is used to input the target detection dataset into a target detection model for training to generate a detection model; the third generation module 403 is used to input the state recognition dataset into a recognition model for training to generate a state recognition model; the fourth generation module 404 is used to process the video of the scrap steel slag container to be tested to generate data to be detected; the fifth generation module 405 is used to input the data to be detected into the detection model for detection to generate a target to be recognized; the sixth generation module 406 is used to input the target to be recognized into the state recognition model to generate a recognition result; and the visualization module 407 is used to input the recognition result into an interactive interface module to guide the slag spraying operation.
[0063] Specifically, the model training phase consists of three steps: image preprocessing, object detection model training, and state recognition model training. The model deployment and testing phase consists of four steps: data preprocessing, object detection model testing, state recognition model testing, and data interaction interface.
[0064] In this embodiment, the step of acquiring the video of the waste steel slag container and performing preprocessing to generate the target detection dataset and the state recognition dataset includes: acquiring the video of the waste steel slag container and performing data cleaning to obtain image data; and generating the target detection dataset and the state recognition dataset based on the image data.
[0065] Specifically, the video processing first involves data cleaning, extracting the video of the time period when the train passes by, and converting it into image data by extracting 1 frame out of 20 frames, and then creating object detection datasets and state recognition datasets respectively.
[0066] In this embodiment, the object detection dataset is input into the object detection model for training. The generation of the detection model includes: inputting the object detection dataset into the Yolov7 object detector of the object detection model for training, predicting the classification regression box of the slag pot at multiple scales, generating the detection model, and performing pruning operations to compress the detection model.
[0067] Specifically, the completed target detection dataset is fed into the Yolov7 detection model for training, which includes detecting four types of targets: slag cans, can connectors, train tails, and train locomotives. The regression boxes of the four types of objects are predicted at multiple scales, and the trained detection model is then pruned to compress the model.
[0068] In this embodiment, the process of inputting the state recognition dataset into the recognition model for training and generating the state recognition model includes: inputting the state recognition dataset into the recognition model; the recognition model reads data in batches to form positive and negative sample pairs; extracting features using a ResNet50 network and fusing 1 / 16 and 1 / 32 scale features; calculating the distance between positive and negative samples using a preset algorithm; optimizing the model using a preset method; and compressing the model using amplitude iterative pruning to generate the state recognition model.
[0069] Specifically, the completed state recognition model dataset is fed into the recognition model for training. First, data is read in batches to form positive and negative sample pairs. Features are extracted using a ResNet50 network and fused with 1 / 16 and 1 / 32 scale features. A joint distance calculation formula is designed, and Euclidean distance and cosine distance are used to jointly calculate the distance between positive and negative samples. From the perspective of two distance measurement, the model recognition accuracy is improved. The model is optimized by using a method of joint training with center loss and triplet loss to reduce intra-class distance and increase inter-class distance. Finally, amplitude iterative pruning technique is used to compress the model, thus completing all the work in the training phase.
[0070] In this embodiment, the processing of the video of the waste steel slag filling to be tested to generate the data to be tested includes: processing the data of the video of the waste steel slag filling to be tested by extracting 1 frame at intervals of 4 frames, and adjusting the data to a preset size image to generate the data to be tested.
[0071] In this embodiment, inputting the data to be detected into the detection model for detection and generating the target to be identified includes: inputting the data to be detected into the detection model, and the detection model cropping the detected slag pot target to generate the target to be identified.
[0072] Specifically, the video data is first fed into the model by sampling one frame every four frames, and the data is adjusted to a 640*640 image size before being fed into the detection model. The trained YOLOv7 model is used to detect the slag can, can connector, train tail, and train head. The detected train head provides a signal to start the recognition process, and the detected train tail provides a signal to end the process. The strategy of identifying the can connector to assist in updating the can number is used. The detected slag can targets are cropped out, adjusted to a 256*256 image size, and fed into the recognition model.
[0073] In this embodiment, inputting the target to be identified into the state recognition model to generate a recognition result includes: inputting the target to be identified into the state recognition model, the state recognition model performing feature extraction and feature fusion on the target to be identified, calculating the distance between the target to be identified and a preset target, determining the empty / full state, and generating a recognition result.
[0074] Specifically, the state recognition model performs feature extraction and feature fusion on the input image data, calculates the distance between it and the template tank, sets a threshold of 0.85. If the distance is greater than the threshold, it is judged as a non-empty tank; otherwise, it is an empty tank. Then, the recognition results of a train of tank cars are transmitted as a "heartbeat" through a thread and the data is saved.
[0075] In this embodiment, inputting the recognition result into the interactive interface module to guide the slag spraying operation includes: inputting the recognition result into the interactive interface module, realizing visualization through a Web application framework, thereby guiding the slag spraying operation.
[0076] Specifically, the interactive interface module receives the "heartbeat," transmits the received "heartbeat" to the receiver via Modbus, uses the Flask framework to visualize the real-time identification process, and transmits the results to the storage device and display terminal to guide the slag spraying operation.
[0077] Example 3
[0078] Example 3 provides a specific implementation method in practical applications, and the implementation method is as follows:
[0079] 1. Dataset Acquisition and Creation
[0080] 1.1 The dataset for this invention originates from video footage of waste steel slag tank transportation collected from a factory in a certain year. The video collection period was 24 months, with continuous collection 24 hours a day. Steel slag tank trucks transported one train every two hours, with one locomotive pushing N tank trucks in each train. The effective video is divided into 16432 segments, each one hour long, with a frame height of 704 pixels, a frame width of 576 pixels, a bitrate of 1639 kb / s, and an MPEG4 format. This invention aims to study the identification of the empty / full state of waste steel slag tanks; therefore, a camera was mounted directly above the steel slag tanks during transportation to facilitate obtaining detailed images of the tank's interior. The video footage is shown below. Figure 5 As shown, the camera is installed in the following location: Figure 6 As shown.
[0081] 1.2 Object Detection Dataset Creation
[0082] Based on the tanker's operating speed and the number of images captured from each tank, an image sequence was formed by extracting one effective frame from every 20 frames, resulting in a total of 130,180 images. Of these, 74,144 were used as the training set, 13,018 as the validation set, and 43,018 as the test set. Since this invention needs to consider functions such as saving tanker identification results by train segment, the target detection model, in addition to identifying tanks, will also identify tank connectors, train tails, and train locomotives. Labelme annotation software was used to label the four types of targets to be identified.
[0083] 1.3 Target Recognition Dataset Creation
[0084] The acquired images of the cans are categorized based on whether the cans are empty or full and the color of the sky (this can be done manually), into 'daytime empty cans', 'nighttime empty cans', 'daytime not empty cans', and 'nighttime not empty cans'. The images are then renamed. The first four digits of the name represent the category of the target in the image: 'daytime not empty cans' is '0000'; 'daytime empty cans' is '0001'; 'nighttime not empty cans' is '0002'; and 'nighttime empty cans' is '0003'. The last two digits represent the position of the can within the camera's field of view: 'c1' for cans halfway in, 'c2' for cans fully in, and 'c3' for cans halfway out.
[0085] 2. Slag Pot Inspection Method
[0086] The YOLOv7 algorithm is employed, which surpasses all known object detectors in both speed and accuracy within the 5FPS to 160FPS range. The YOLOv7 algorithm is an improvement upon the YOLOv4 and YOLOv5 network frameworks, using a multi-scale prediction approach. The initial input image size is adjusted to 640*640*3; after passing through the backbone layer, three feature maps of different scales are generated through the head layer. This invention addresses a classification problem involving four target categories: slag cans, can connectors, train tails, and train locomotives. The ground truth of each target is represented by three anchors in each grid of the feature map at various scales. A REPVGG structure is introduced, employing multi-path branching to improve model performance. An auxiliary head, Aux-Head, is added to optimize the model alongside the Lead-Head. Anchors and ground truth are matched, with three positive samples assigned to the Aux-Head and five positive samples to the Lead-Head, with a loss weight ratio of 1:4. The final result is a 9-dimensional prediction, representing bounding box coordinates (x, y, w, h), bounding box confidence, and the number of categories (4).
[0087] 3. Slag Pot Identification Method
[0088] 3.1 Feature Extraction Network
[0089] The ResNet50 network is a classic deep feature extraction network capable of extracting features at five different scales, reflecting different characteristics of an image. For example, shallow networks can obtain high-resolution feature information from images, facilitating the extraction of color and edge features, while deep networks can obtain stronger semantic feature information. The waste steel slag can empty / full state recognition model used in this invention first extracts features using an improved ResNet50 network, then measures the distance between these features and the features of template images in the image library. All distances are then sorted from smallest to largest, and the image with the smallest distance is selected to determine the waste steel slag can state category. Since this invention's recognition method is based on object detection, the recognition model does not require object localization. The high-resolution feature information of shallow networks has little effect on the recognition model and may even be used as noise by the network, affecting the final recognition effect. Therefore, only 1 / 16 and 1 / 32 scale features are extracted, and the stride of the 1 / 32 scale features in the ResNet50 network is adjusted to 1 to maintain consistent feature resolution at the 1 / 16 and 1 / 32 scales, facilitating subsequent feature fusion. Linear weighted adaptive fusion is then performed to better utilize the semantic information of the latter two layers.
[0090] 3.2 Selection method for positive and negative sample pairs
[0091] The selection method of positive and negative sample pairs significantly affects the recognition performance of this invention. The batch size is set to 16, and four types of waste steel slag cans are loaded, with four images for each type. The first image loaded in each batch is used as the positive sample, resulting in a batch containing four positive sample pairs and twelve negative sample pairs. After calculating the sample distances, the maximum distance in the positive samples and the minimum distance in the negative samples are used as a pair of difficult-to-identify samples and fed into the triplet loss function for iterative training.
[0092] 3.3 Distance Calculation Formula
[0093] This invention provides a joint optimization distance calculation formula. The distance calculation formula used in the triplet loss function is a joint optimization calculation of Euclidean distance and cosine similarity distance, replacing the single Euclidean distance calculation. This is because Euclidean distance measures the absolute distance between features, which is directly related to the position coordinates between points, while cosine similarity distance measures the angle between spatial vectors, which pays more attention to the difference in direction. Therefore, the two distance measurement methods are selected for joint training. For example, Equation (1) is the Euclidean distance calculation formula, and Equation (2) is the cosine similarity calculation formula.
[0094]
[0095]
[0096] Where d represents the Euclidean distance between the two vectors, x i y represents the i-th dimension feature vector of the waste steel slag can to be identified. i Let represent the i-th eigenvector of the steel slag can template in the image library, and let cosθ represent the cosine similarity between the two eigenvectors. This represents the summation from i=1 to i=n, where i represents the i-th dimension and n represents the total dimension.
[0097] Then calculate the joint optimization distance between vectors, as shown in equation (3). First, initialize α and β, then normalize them and put them into the numerical weights obtained by the network adaptive learning, thus eliminating the process of traversing and finding the best.
[0098] Distance=αd+β(1-cosθ) (3)
[0099] Where Distance represents the joint optimization distance, α represents the Euclidean distance weight, and β represents the cosine distance weight.
[0100] 3.4 Loss Function
[0101] This invention uses a triplet loss function different from the traditional ReID and proposes a multi-loss function joint training method. The addition of a center loss function can reduce intra-class discrepancies and narrow the feature differences between scrap steel slag cans with the same ID and shooting angle. Since the data distribution selected by the triplet loss function is not necessarily uniform, it can lead to unstable model training performance, slow convergence, long training time, and easy overfitting, ultimately resulting in training failure. Therefore, adding a label smoothing loss function to the fully connected layer can effectively improve the above problems, preventing the model from only focusing on the loss of the correct label position, and also considering the loss of other incorrect label positions, thus improving the model's learning ability and increasing its generalization ability. The triplet loss function is shown in Equation (4), the center loss function is shown in Equation (5), the label smoothing loss function is shown in Equation (6), and the total loss function is calculated as shown in Equation (7).
[0102] L Tri =[d p -d n +α] + (4)
[0103]
[0104]
[0105] L = L ID +L Tri +βL C (7)
[0106] Where, d p and d n It represents the distance between positive sample pairs and the distance between negative sample pairs, with α set to 0.3, and y j This represents the label of the j-th image in a mini-batch. To represent the y-th deep feature j Class center, Let represent the features of the j-th image, B represent the batch_size value, and y represent the true label. i L represents the predicted probability. Tri L represents the triplet loss. C L represents the loss at the center. ID L represents the label smoothing loss, and L represents the total loss. q represents the sum of the distances from the center features of all images in a batch. i The actual label of the i-th element. The required q represents N categories. i log(p i ) and.
[0107] 3.5 Evaluation Indicators
[0108] The evaluation metrics are Rank-k and mAP. Taking Rank-1 as an example, it means whether all query images are the same as their first returned result in the image library, which represents the accuracy of the first search target of the image. The same calculation method can be used to obtain Rank5 and Rank10. The mAP metric means average precision, which is a commonly used evaluation metric in multi-object detection and multi-label image classification. It is the average of the average precision AP in multi-classification tasks, which represents the accuracy of all search results. The calculation formula is shown in Equation (8).
[0109]
[0110]
[0111] Among them, Precision c Images represent the precision for each individual category. c AP represents the number of images contained in each category. K C represents the average accuracy of each category, and C represents the total number of categories; in this embodiment, C represents 4 categories, each including 6 different angles, for a total of 24 subcategories. This represents the sum of the precision required for each category.
[0112] 3.6 Methods for Identifying Details
[0113] During the identification process, data storage begins when the locomotive is detected, signifying the start of identification. Data storage ends when the tail of the locomotive is detected, and video data is saved and the identification result is sent. This invention involves placing the waste steel slag can into the video camera at a distance of 2 / 3 to begin identification, thereby obtaining more features inside the can and improving the accuracy of the identification results. The model identifies the state of the steel slag can at a frequency of 1 frame every 4 frames. Through statistics, the number of identification frames for each steel slag can is between 50 and 70. The judgment results of all identification frames for a steel slag can are statistically voted, and the state with the most votes is selected as the final identification result and saved.
[0114] Since the receiver needs to receive the judgment result for each can, it is necessary to process the correspondence between the cans and the results. During target detection, we added the identification of the connection points between cans. Because the cans are relatively large, the camera cannot capture two cans entering the lens simultaneously. Therefore, when a connection point is detected, it can be considered that the previous can has been passed, and subsequent cans are considered the next cans. To enhance the robustness of the algorithm, this embodiment also adds logical judgments to determine changes in the can number, such as... Figure 7 As shown:
[0115] Figure 7Assuming a camera captures the image, and the circular container represents a scrap steel slag container moving downwards, the target detection model frames the container. The xmin, ymin, xmax, and ymax of this frame correspond to the coordinates of the left, top, right, and bottom sides of the square. As the container moves downwards, (ymin + ymax) / 2 continuously increases from 0 to n, where n is the maximum y-value of the camera. When a new container enters, (ymin + ymax) / 2 becomes a very small value. Therefore, we incorporate the change in (ymin + ymax) / 2 into the container number change detection as an auxiliary method to identify container number changes.
[0116] After the tanker truck passes by, the present invention transmits the identification results to the receiver in the form of pulse signals via Modbus, guiding the next step of slag spraying. The identification results are visualized using the Flask framework and displayed in real time.
[0117] 4. Lightweight Model Technology
[0118] A major challenge in the research of identifying the empty / full state of scrap steel slag containers is real-time performance. Lightening the network model is crucial for improving its processing speed. "Pruning" is a neural network compression technique that removes weights of low importance, minimizing the impact on overall model accuracy. Furthermore, the significant reduction in parameters improves processing speed, ultimately enhancing overall model performance. The amplitude iterative pruning algorithm based on the lottery hypothesis argues that while traditional pruning techniques achieve lightweighting, the resulting sparse network is difficult to train from scratch. Only by inheriting weights from a larger network can similar performance be achieved. The lottery hypothesis suggests that within an initially dense neural network, finding subnetworks and retraining them can achieve similar accuracy to the original network within a similar number of iterations, with the parameter count reduced to 5%–10% of the initial dense network. This invention employs an amplitude iterative pruning algorithm based on the lottery hypothesis to select the optimal subnetwork from a Reid-based scrap steel slag container empty / full state model. The specific steps are as follows: Figure 8 As shown;
[0119] First, initial weights W0 are assigned to the initial ResNet50 network. After the training data from the waste steel slag container converges, the following is obtained: The weighted parameter is obtained by removing 10% of the branches according to the set pruning rate. The remaining subnetworks are then weighted using the initial weights W0 to obtain m. (1) ⊙W 0, And continue to repeat the above work to obtain And so on. After performing the above operations twelve times, the accuracy began to decrease. After twelve unstructured pruning operations, the network size was 28.2% of the initial network size.
[0120] This invention completes the first stage of the task of identifying the empty and full states of waste steel slag cans using target detection technology. Training samples are divided into four categories: slag cans, can connectors, train tails, and train heads. The total period is set to 300, the early stop parameter is set to 75 epochs, and the confidence threshold is set to 0.8. Experiments show that the detection model achieves a precision of 98.1% and a recall of 95.4%. Since the detection task is only the first stage of the identification task, aiming to locate and crop the waste steel slag cans to reduce the impact of background noise on the identification model, maintaining a precision and recall rate above 90% is sufficient to meet the requirements of the second stage of this invention.
[0121] After model lightweighting and pruning, this invention achieved 100% Rank1, Rank5, and Rank10 accuracy, with an mAP of 98.7%, representing a 12.1% improvement over the baseline model. Testing with 100 train images, including 500 waste steel slag cans during nighttime and 500 during daytime, revealed two recognition errors: one nighttime image with a blurred video and one daytime image of an empty can with severely oxidized walls due to long-term use. Compared to known results from other recognition systems, this reduced the number of incorrectly identified cans by 5, saving immeasurable manpower compared to manual operation. The recognition results are as follows: Figure 9 As shown.
[0122] This invention uses waste steel slag tank target detection technology to detect the target, and then extracts the internal features of the tank through an improved ResNet50 network. The 1 / 16 and 1 / 32 scale features are fused, and visualization reveals that the fused features contain some semantic information at the 1 / 16 scale and also rich semantic information at the 1 / 32 scale. By comparing the recognition performance of the proposed model with the baseline model in complex weather and lighting scenarios, it can be found that the improved feature extraction network of this invention improves the mAP by 7.2%. The comparison results are shown in Table 1, where S1 to S5 represent the five scale features extracted by the ResNet50 network.
[0123] Table 1 Experimental results of different feature extraction models
[0124]
[0125] This invention designs a joint optimization distance formula, adjusting the distance calculation formula in the triplet loss to a joint optimization of Euclidean distance and cosine distance. The final total distance is obtained by adaptively weighting and linearly combining the two distances. To further explore the impact of the proportion of each distance in the joint distance on the experimental results, and considering that the normalized Euclidean distance and cosine distance are of the same order of magnitude, comparative experiments were conducted by varying the Euclidean distance decision factor from large to small and the cosine distance decision factor from small to large. The final experimental results demonstrate that the performance of this model is improved by 4.2% compared to the original model using only Euclidean distance.
[0126] Table 2. Experimental Results of the Weighting of Two Joint Optimization Distance Formulas
[0127]
[0128]
[0129] Because the negative samples include scrap steel slag cans with different IDs and scrap steel slag cans with the same ID but different entry locations, the data input for training may be uneven. The model does not yet possess high recognition accuracy, so training using only the triplet loss function yields poor results. A classification loss function needs to be added for joint training. Experiments show that using the labelsmooth_loss function is better than using the softmax_loss function because the former allows the model to pay slight attention to the weights of low-probability distributions, ensuring that some low-probability positive samples are not ignored. Experiments also demonstrate that adding a center loss function to reduce the distance between similar samples helps to slightly improve the model's recognition ability. Therefore, for the selection of loss functions, this invention adopts a joint training approach using triplet loss, center loss, and label smoothing loss. As an ablation experiment, the experimental results of the model under the same hyperparameter conditions were tested using only triplet loss, triplet loss + softmax cross-entropy loss, and triplet loss + softmax cross-entropy loss + center loss. The results show that the proposed method achieves an mAP of 98.7%, which is higher than 40.2% using only triplet loss, 10.3% higher than triplet loss + softmax cross-entropy loss, and 5.6% higher than triplet loss + softmax cross-entropy loss.
[0130] After the model achieves the optimal result based on the above experiments, this invention compresses the waste steel slag tank empty / full state recognition model based on amplitude iterative pruning technology. Pruning rates are set to 20%, 40%, 60%, and 80%, respectively, with pruning performed after each iteration, each iteration containing 100 epochs. Experiments show that the model with the 12th pruning iteration at a pruning rate of 20% performs best. Similarly, the optimal number of pruning iterations is 4, 3, and 1 for pruning rates of 40%, 60%, and 80%, respectively. The model test results under different pruning rates are shown in Table 3. Table 3 shows that when the number of remaining parameters is similar, the model with a lower pruning rate and more pruning iterations significantly outperforms the model with a higher pruning rate and fewer pruning iterations. This demonstrates the importance of recovery training after amplitude iterative pruning; a small number of pruning iterations and recovery training iterations can effectively screen out the optimal sparse subnetwork. Among the models with pruning rates of 20% and 40%, the recognition accuracy is similar. The model with a pruning rate of 20% has fewer remaining parameters and better lightweight compression effect on the network. Therefore, this invention selects the model with a pruning rate of 20% and pruning times of 12 as the final model.
[0131] Table 3. Model test results under different pruning rates
[0132]
[0133] In summary, the method and system for identifying the empty or full state of waste steel slag containers of the present invention can identify the empty or full state of waste steel slag containers under various environmental conditions, with high accuracy, improved work efficiency, reduced labor costs, and enhanced safety.
[0134] The foregoing description of specific exemplary embodiments of the invention is for illustrative and explanatory purposes. These descriptions are not intended to limit the invention to the precise forms disclosed, and it will be apparent that many changes and variations can be made in accordance with the foregoing teachings. The exemplary embodiments were chosen and described in order to explain the specific principles of the invention and its practical application, thereby enabling those skilled in the art to implement and utilize various different exemplary embodiments of the invention, as well as various different choices and variations. The scope of the invention is intended to be defined by the claims and their equivalents.
Claims
1. A method for identifying the empty or full state of a waste steel slag container, characterized in that, The method, applicable to various environmental conditions, includes: Acquire videos of waste steel slag tanks, perform preprocessing, and generate target detection datasets and state recognition datasets; The target detection dataset is input into the target detection model for training to generate the detection model; The state recognition dataset is input into the recognition model for training to generate the state recognition model; The video of the waste steel slag container to be tested is processed to generate the test data; The data to be detected is input into the detection model for detection, generating the target to be identified; The target to be identified is input into the state recognition model to generate a recognition result; The identification results are input into the interactive interface module to guide the slag spraying operation; The process of inputting the state recognition dataset into the recognition model for training to generate the state recognition model includes: The recognition model reads image data from the state recognition dataset in batches and is trained by constructing positive and negative sample pairs; Features are extracted using the ResNet50 network, and the 1 / 16 scale features and 1 / 32 scale features output by the ResNet50 network are fused. The 1 / 16 scale features are used to capture local details including tank wall corrosion patterns and slag texture, while the 1 / 32 scale features are used to understand the overall distribution pattern of slag inside the tank. The distance between positive and negative sample pairs is calculated using a preset algorithm, which is a joint optimization distance formula. , For Euclidean distance, For cosine similarity, and These are the weighting coefficients; The model is optimized using a preset method, which includes: initialization. and ,Will and After normalization, the parameters are fed into the network as learnable parameters. The network is then trained using a combination of center loss and triplet loss to enable adaptive learning. and Numerical weights are assigned, and label smoothing loss is introduced in the fully connected layer to prevent model overfitting; The state recognition model is generated by compressing the recognition model using amplitude iterative pruning.
2. The method for identifying the empty / full state of a waste steel slag container as described in claim 1, characterized in that, The process of acquiring and preprocessing video of waste steel slag tanks to generate target detection and state recognition datasets includes: The video of the waste steel slag container was acquired and the data was cleaned to obtain image data; Based on the image data, an object detection dataset and a state recognition dataset are generated.
3. The method for identifying the empty / full state of a waste steel slag container as described in claim 1, characterized in that, The object detection dataset is input into the object detection model for training, and the generated detection model includes: The target detection dataset is input into the Yolov7 target detector of the target detection model for training. Multi-scale prediction of slag pot classification regression boxes is performed to generate the detection model, and the detection model is compressed by pruning.
4. The method for identifying the empty / full state of a waste steel slag container as described in claim 1, characterized in that, The video of the waste steel slag being tested is processed to generate the data to be tested, including: The data from the video of the waste steel slag container to be tested is processed by extracting one frame every four frames, and the data is adjusted to a preset image size to generate the data to be tested.
5. The method for identifying the empty / full state of a waste steel slag container as described in claim 1, characterized in that, The data to be detected is input into the detection model for detection, generating targets to be identified, including: The data to be detected is input into the detection model, and the detection model trims the detected slag pot targets to generate the target to be identified.
6. The method for identifying the empty / full state of a waste steel slag container as described in claim 1, characterized in that, The identification results are input into the interactive interface module to guide the slag spraying operation, including: The identification results are input into the interactive interface module and visualized through a web application framework, thereby guiding the slag spraying operation.
7. A system for identifying the empty / full state of a waste steel slag container, based on the method for identifying the empty / full state of a waste steel slag container as described in any one of claims 1 to 6, characterized in that, The system includes: The first generation module is used to acquire videos of waste steel slag tanks, perform preprocessing, and generate target detection datasets and state recognition datasets. The second generation module is used to input the target detection dataset into the target detection model for training and to generate the detection model. The third generation module is used to input the state recognition dataset into the recognition model for training and generate the state recognition model. The fourth generation module is used to process the video of the waste steel slag being tested and generate the data to be tested. The fifth generation module is used to input the data to be detected into the detection model for detection and generate the target to be identified; The sixth generation module is used to input the target to be identified into the state recognition model and generate a recognition result; and The visualization module is used to input the recognition results into the interactive interface module to guide the slag spraying operation.
8. The system for identifying the empty / full state of a waste steel slag container as described in claim 7, characterized in that, The process of acquiring and preprocessing video of waste steel slag tanks to generate target detection and state recognition datasets includes: The video of the waste steel slag container was acquired and the data was cleaned to obtain image data; Based on the image data, an object detection dataset and a state recognition dataset are generated.