Precious metal feeding anti-theft method, system and device based on visual recognition and medium
By constructing spatiotemporal graph nodes and generating dual-label datasets, combined with a multi-task fusion model, the problems of high false alarm rate and low generalization of existing precious metal feeding anti-theft technologies are solved. Real-time identification and hierarchical handling of precious metal feeding are realized, improving the accuracy of anti-theft strategies and the generalization of models.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- GUANGDONG CHICO ELECTRONIC INC
- Filing Date
- 2026-03-25
- Publication Date
- 2026-06-19
AI Technical Summary
Existing precious metal feeding anti-theft technologies cannot meet the requirements of high precision, high real-time performance, and low false alarm rate in industrial scenarios. They also cannot perform independent feature association, resulting in high false alarm rate and low generalization.
By using a visual recognition-based method, feeders, materials, and equipment are defined as nodes in a spatiotemporal graph. A fused spatiotemporal graph is constructed to generate a dual-label dataset with feed status labels and variability labels. A multi-task fusion model is then used for prediction to determine a tiered anti-theft strategy.
It enables real-time identification, early prediction, and tiered handling of abnormal precious metal feeding behavior, reducing false alarm rates, improving the accuracy of anti-theft strategies and the generalization of models, and adapting to the needs of different security levels.
Smart Images

Figure CN122244796A_ABST
Abstract
Description
Technical Field
[0001] This application relates to the field of anti-theft technology for metal feeding, and in particular to a method, system, device and medium for anti-theft of precious metal feeding based on visual recognition. Background Technology
[0002] Precious metal feeding is a core process in the precious metal deep processing industry. The weight, type, and feeding method of the raw materials directly determine the production efficiency. Moreover, precious metal raw materials are valuable and prone to problems such as smuggling, substitution, and underfeeding. Therefore, the safety monitoring of the feeding process is of extremely high importance.
[0003] Current anti-theft solutions for precious metal feeding mainly fall into three categories: manual inspection, single visual monitoring, and simple sensor threshold alarm. For single feature detection, independent feature association cannot be performed, thus failing to meet the anti-theft requirements of high precision, high real-time performance, and low false alarm rate in industrial scenarios. Summary of the Invention
[0004] The main objective of this application is to propose a method, system, device, and medium for preventing theft of precious metals based on visual recognition, so as to solve one or more technical problems existing in the prior art, and at least provide a beneficial option or create conditions.
[0005] To achieve the above objectives, one aspect of this application proposes a method for preventing theft of precious metals based on visual recognition, the method comprising: Obtain visual data of feeding behavior and construct a feeding coordinate system. Based on the feeding coordinate system, normalize the visual data of feeding to obtain a structured feature vector. Based on the extracted structured feature vectors, the feeder, materials, and equipment are defined as spatiotemporal graph nodes, and time edges and spatial edges are established based on the spatiotemporal graph nodes. Based on the spatiotemporal graph nodes, the temporal edges, and the spatial edges, a fused spatiotemporal graph is constructed, and a graph structure feature vector is output through the fused spatiotemporal graph; Based on the graph structure feature vector, a dual-label dataset with feeding status label and variability label is generated; The dual-label dataset is input into the trained multi-task fusion model, which outputs the predicted values of the feeding status label and the variability label. Based on the predicted values, the corresponding extension domain is determined. Based on the corresponding extended domain, the corresponding hierarchical anti-theft strategy is determined to iteratively optimize the multi-task fusion model.
[0006] In some embodiments, acquiring the visual data of the feeding behavior includes: Using the established target detection model, the feeder, materials and equipment in the feeding area are detected as detection targets, and the data set of candidate areas for each detection target is output. Based on the candidate regions of each detection target, the core action key points of the feeding behavior are identified using the established extraction model and the data set, the visual recognition data of each key point is output, and the visual features of the material are extracted to obtain visual material data. The equipment's working data is acquired, and the visual recognition data and the visual material data are aligned with the working data according to the timestamp. The aligned visual recognition data, the visual material data, and the working data form the feeding visual data.
[0007] In some embodiments, the normalization processing of the material feeding visual data to obtain a structured feature vector includes: The feeding visual data is cleaned, and the feeding coordinate system is constructed with the center of the feeding port in the feeding area as the absolute reference origin; wherein, the feeding visual data includes visual recognition data, visual material data and working data; The coordinates of the visual recognition data are normalized and converted into relative coordinates in the feeding coordinate system. The visual material data and the working data are numerically normalized and interference factors are eliminated. All cleaned and normalized data are standardized to obtain the structured feature vector.
[0008] In some embodiments, establishing temporal and spatial edges based on the spatiotemporal graph nodes includes: Extract the structured feature vectors of the set number of consecutive time series to form a time series segment of feeding behavior, and take the feeder, material and equipment as nodes of the spatiotemporal graph. Each of the three spatiotemporal graph nodes has corresponding node features. Based on the time sequence segment, within the same frame, according to the node characteristics of the spatiotemporal graph nodes of the feeder, the physiological connection relationship of the feeder's key points is established, and according to the node characteristics of the three spatiotemporal graph nodes, the spatial association relationship between the feeder, material, and equipment is established to form the spatial edge; Based on the time sequence segment, the time edge is obtained within adjacent frames according to the motion trajectory of the same node feature.
[0009] In some embodiments, the material feeding visual data includes visual recognition data, visual material data, and working data, and the three spatiotemporal graph nodes store corresponding node features, including: The visual recognition data and the relative coordinates obtained by normalizing the visual recognition data in the feeding coordinate system are used as the node features of the feeder; The visual material data is numerically normalized as the node feature of the material, and the working data that has undergone numerical normalization and standardization processing is used as the node feature of the device.
[0010] In some embodiments, generating a dual-label dataset with feeding status labels and variability labels includes: Based on the graph structure feature vector, calculate the three-dimensional scores of posterior uncertainty, boundary proximity, and sparsity for each sample. Use the set clustering algorithm to divide the three-dimensional scores into high-score clusters and low-score clusters, and assign labels to the high-score clusters and the low-score clusters to obtain the variability labels. Based on the graph structure feature vector, the set visual recognition algorithm is used to identify the abnormal feeding state and the normal feeding state, and the labels are assigned to the abnormal feeding state and the normal feeding state to obtain the feeding state label; The graph structure feature vector, the corresponding feeding status label, and the variability label are combined to obtain the dual-label dataset.
[0011] In some embodiments, determining the corresponding extension domain based on the predicted value includes: Based on the predicted value of the feeding status label, the feeding status label is determined to be in a normal feeding state, and based on the predicted value of the variability label, the variability label is determined to be a low variability label. Based on the normal feeding state and the low variability label, the corresponding extension domain is determined to be a positive variable domain from the four quadrant extension domain, so as to determine the hierarchical anti-theft strategy set for the positive variable domain. The four-quadrant extension domain includes the negative quantitative domain, the negative qualitative domain, the positive qualitative domain, and the positive quantitative domain.
[0012] To achieve the above objectives, another aspect of this application proposes a visual recognition-based precious metal dispensing anti-theft system, the system comprising: The data processing module is used to acquire visual data of feeding behavior, construct a feeding coordinate system, and perform normalization processing on the feeding visual data based on the feeding coordinate system to obtain a structured feature vector. The association establishment module is used to define the feeder, material and equipment as spatiotemporal graph nodes based on the extracted structured feature vector, and to establish temporal edges and spatial edges based on the spatiotemporal graph nodes; The fusion spatiotemporal graph construction module is used to construct a fusion spatiotemporal graph based on the spatiotemporal graph nodes, the time edges, and the spatial edges, and output graph structure feature vectors through the fusion spatiotemporal graph; A dual-label construction module is used to generate a dual-label dataset with feeding status labels and variability labels based on the graph structure feature vector; The extension domain decision module is used to input the dual-label dataset into the trained multi-task fusion model, output the predicted values of the feeding state label and the variability label, and determine the corresponding extension domain based on the predicted values. The execution feedback module is used to determine the corresponding hierarchical anti-theft strategy based on the corresponding extended domain, so as to iteratively optimize the multi-task fusion model.
[0013] To achieve the above objectives, another aspect of this application provides an electronic device, which includes a memory and a processor. The memory stores a computer program, and the processor executes the computer program to implement the above-described method.
[0014] To achieve the above objectives, another aspect of the embodiments of this application proposes a computer-readable storage medium storing a computer program that, when executed by a processor, implements the above-described method.
[0015] The embodiments of this application include at least the following beneficial effects: This application provides a method, system, device, and medium for anti-theft of precious metal feeding based on visual recognition. This solution defines the feeder, material, and equipment as nodes in a spatiotemporal graph, realizing the integrated monitoring of all elements of people, materials, and equipment. It deeply integrates the actions of the feeder, the state of the precious metal material, and the parameters of the feeding equipment to construct spatiotemporal correlation features, solving the false alarm problem of only looking at actions without looking at materials or only looking at values without looking at behavior, and breaking through the limitations of single feature detection in existing technologies. Through a dual-label dataset of feeding status labels and variable labels and extension domain decision-making, it realizes the early prediction of abnormal behavior. Based on the division, the corresponding extension domain is determined, and the hierarchical anti-theft strategy is executed to adapt to the precious metal feeding needs of different security levels, improve the accuracy of the anti-theft strategy, and realize differentiated hierarchical anti-theft handling. It realizes real-time identification, early prediction, and hierarchical handling of abnormal behavior in precious metal feeding. At the same time, it improves the generalization of the model through a feedback iterative optimization mechanism, reduces the false alarm rate and industrial deployment costs, and solves the problems of passive alarm, high false alarm rate, low generalization, and insufficient integration of all elements in existing technologies. Attached Figure Description
[0016] Figure 1 This is a flowchart of a vision-based precious metal feeding anti-theft method provided in an embodiment of this application; Figure 2 This is a schematic diagram of the four-quadrant extension domain provided in the embodiments of this application; Figure 3 This is a schematic diagram of the structure of the visual recognition-based precious metal feeding anti-theft system provided in the embodiments of this application; Figure 4 This is a schematic diagram of the hardware structure of the electronic device provided in the embodiments of this application. Detailed Implementation
[0017] To make the objectives, technical solutions, and advantages of this application clearer, the following detailed description is provided in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of this application and are not intended to limit it. In the following description, when referring to the accompanying drawings, unless otherwise indicated, the same numbers in different drawings represent the same or similar elements. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with those of this application; they are merely examples of apparatuses and methods consistent with some aspects of the embodiments of this application as detailed in the appended claims.
[0018] It is understood that the terms “first,” “second,” etc., used in this application may be used herein to describe various concepts, but unless otherwise stated, these concepts are not limited by these terms. These terms are only used to distinguish one concept from another. For example, without departing from the scope of the embodiments of this application, first information may also be referred to as second information, and similarly, second information may also be referred to as first information. Depending on the context, the words “if,” “when,” or “in response to a determination” as used herein may be interpreted as “when…” or “when…” or “in response to a determination.”
[0019] As used in this application, the terms "at least one", "multiple", "each", "any", etc., "at least one" includes one, two or more, "multiple" includes two or more, "each" refers to each of the corresponding multiples, and "any" refers to any one of the multiples.
[0020] Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used herein is for the purpose of describing embodiments of this application only and is not intended to limit this application.
[0021] Figure 1 This is an optional flowchart of a vision-based precious metal feeding anti-theft method provided in this application embodiment. Figure 1 The method may include, but is not limited to, steps S100 to S600.
[0022] Step S100: Obtain visual data of feeding behavior and construct feeding coordinate system. Based on feeding coordinate system, normalize the visual data of feeding to obtain structured feature vector.
[0023] Step S200: Based on the extracted structured feature vectors, define the feeder, materials, and equipment as spatiotemporal graph nodes, and establish time edges and spatial edges based on the spatiotemporal graph nodes.
[0024] Step S300: Based on the spatiotemporal graph nodes, temporal edges, and spatial edges, a fused spatiotemporal graph is constructed. Through the fused spatiotemporal graph, the graph structure feature vector is output.
[0025] Step S400: Generate a dual-label dataset with feeding status label and variability label based on the graph structure feature vector.
[0026] Step S500: Input the dual-label dataset into the trained multi-task fusion model, output the predicted values of the feeding status label and the variability label, and determine the corresponding extension domain based on the predicted values.
[0027] Step S600: Determine the corresponding hierarchical anti-theft strategy based on the corresponding extended domain, so as to iteratively optimize the multi-task fusion model.
[0028] Steps S100 to S600 as illustrated in this application embodiment define the feeder, material, and equipment as spatiotemporal graph nodes to achieve integrated monitoring of all elements of personnel, materials, and equipment. This deeply integrates the feeder's actions, the state of the precious metal material, and the parameters of the feeder equipment to construct spatiotemporal correlation features, solving the false alarm problem of relying solely on actions or numerical values without considering the material or behavior. This overcomes the limitations of single-feature detection in existing technologies. Through a dual-label dataset of feeder status and variability labels and extended domain decision-making, abnormal behavior can be predicted in advance. Based on the dual-label results and division of feeder status and variability, the corresponding extended domain is determined, and a graded anti-theft strategy is implemented to adapt to the precious metal feeder requirements of different security levels, improving the accuracy of the anti-theft strategy and achieving differentiated graded anti-theft handling. This enables real-time identification, early prediction, and graded handling of abnormal precious metal feeder behavior. Simultaneously, a feedback iterative optimization mechanism improves the model's generalization ability, reduces false alarm rates and industrial deployment costs, and solves the core problems of existing technologies such as passive alarms, high false alarm rates, low generalization, and insufficient integration of all elements.
[0029] Through technologies such as spatial normalization, 3σ outlier removal, and MediaPipe key point visibility filtering, it has strong adaptability to interference factors such as changes in lighting, material occlusion, personnel protective equipment, and instantaneous sensor hop counts in industrial scenarios, ensuring stable operation in complex industrial environments.
[0030] In some embodiments of S100, based on the acquisition sensors installed in the feeding area, data is collected through the acquisition sensors, and the three core detection targets in the feeding area, namely the feeder, precious metal materials and feeding equipment, are detected using the established target detection model, and a set of candidate areas for each target is output.
[0031] The target detection model is the YOLOv5 target detection model, but other detection models are not limited in this application. The feeding equipment includes: electronic scale, feeding port and material box.
[0032] Based on the candidate regions, using the established extraction model, 21 key points of the hands, 12 key points of the upper limbs and torso of the person feeding the food were cropped and extracted. The normalized coordinates and visibility information of each key point were output, and the confidence of the visibility information was extracted to form visual recognition data.
[0033] Simultaneously, visual features of precious metal materials are extracted to obtain visual material data.
[0034] The visual material data includes: material type, material quantity, and location characteristics. Visual material data may also include other data, which are not limited in this application. The extraction model can be a MediaPipe pose or MediaPipehand keypoint extraction model.
[0035] The system acquires the operating data of the feeding equipment, aligns the visual recognition data, visual material data, and operating data with timestamps, and stores them in the edge database to form feeding visual data. The acquisition frequency is set to 200ms / time to balance real-time performance and computational load.
[0036] The operating data includes the weight of the electronic scale and the status of the feeding port. Other data may also be included, which are not limited in this application.
[0037] Data cleaning is achieved by using the 3σ principle to remove outliers and using linear interpolation to fill in missing values.
[0038] Outliers can be the instantaneous number of jumps in the electronic scale and / or key point detection errors. Missing values can be due to the loss of key points caused by temporary occlusion or the loss of parameters caused by temporary sensor disconnection. This application does not impose any restrictions on the types of the above values.
[0039] A three-dimensional feeding coordinate system is constructed with the center of the feeding port as the absolute reference origin.
[0040] Based on visual recognition data, the normalized coordinates of the 2D screen of each core key point of the feeder are converted into relative coordinates under the feeding coordinate system to achieve spatial normalization.
[0041] Based on visual material data and work data, the material type, quantity, location characteristics, electronic scale weight, and feeding port status are numerically normalized. The min-max normalization formula is used to convert them into normalized values in the range of [0,1], eliminating spatial interference factors such as camera installation location, feeding personnel height, and material specifications, and ensuring the consistency and comparability of features.
[0042] All cleaned and normalized feature data are standardized to form structured feature vectors.
[0043] In some embodiments of S200, a set number of structured feature vectors are extracted according to the time series to form a time sequence segment of feeding behavior. The feeder, material and equipment are used as nodes in the spatiotemporal graph. Each node has corresponding node features, and the node features carry normalized coordinates and visibility information.
[0044] Based on temporal segments, node features within the same frame are extracted. For the spatiotemporal graph nodes of the feeder, physiological connections (shoulder-elbow-wrist) are established for the feeder's key points. For the three spatiotemporal graph nodes of the feeder, material, and equipment, spatial relationships between the feeder, material, and equipment are established. Spatial edges are obtained through physiological and spatial connections.
[0045] Based on time segments, the time edge is the motion trajectory of the same node in adjacent frames.
[0046] The motion trajectory can be the movement trajectory of the hand from the electronic scale to the feeding port and / or the change trajectory of the material weight.
[0047] In some embodiments of S300, a fused spatiotemporal graph of people, materials, and equipment is constructed based on the definition of spatiotemporal graph nodes and edges, realizing the preliminary correlation between the spatial structural features and temporal motion features of material feeding behavior.
[0048] The final output of the fused spatiotemporal graph is a graph structure feature vector with node features and edge weights. Its dimension and numerical range are determined by the structured feature vector in S100, ensuring that it can be directly input into S400 for dual labeling without additional format conversion.
[0049] In some embodiments of S400, the variable tag differs from traditional technical solutions that only determine normal or abnormal status. The essence of variable tag determination is to quantify the variability of the material feeding state through algorithms, predicting whether the state will transform into an abnormal state, and providing a core basis for tiered anti-theft strategies. This is achieved through graph structure feature vectors.
[0050] Addressing the pain point that existing precious metal feeding anti-theft technologies can only passively detect anomalies and cannot predict potential risks in advance, this paper uses variable tags to determine the stability of the feeding status: for stable normal states (low variable tags), routine monitoring and handling are performed; for abnormal states (high variable tags), direct handling is performed; for unstable states (high variable, such as normal feeding but deviation from the standard, suspected anomalies that have not yet been determined), early warning and intervention are provided, realizing an upgrade from "passive alarm" to "proactive prediction".
[0051] Based on graph structure feature vectors, the label variability algorithm is used to calculate the three-dimensional scores of posterior uncertainty, boundary proximity and sparsity for each sample.
[0052] Posterior uncertainty characterizes the fuzziness of the model's determination of the feeding state; boundary proximity characterizes whether the feeding state is at the normal-abnormal boundary; sparsity characterizes whether the feeding state is a rare industrial scenario.
[0053] The proposed clustering algorithm divides the three-dimensional scores into high-score clusters and low-score clusters. High-score clusters are labeled with high variability (v=+1) (unstable state, prone to transformation into anomalies), while low-score clusters are labeled with low variability (v=-1) (stable state, clear normal / abnormal characteristics).
[0054] The variable labels include high-variability labels and low-variability labels. The clustering algorithms used include Gaussian Mixture Model (GMM) and K-means clustering. The GMM model is iterated 50 times, and the K-means clustering uses the elbow rule to determine the number of clusters to be 2.
[0055] Based on the requirements of precious metal feeding processes, a visual recognition algorithm is used to classify feeding status into normal feeding status (y=+1) and abnormal feeding status (y=-1). Abnormal feeding status is further subdivided into four categories: stolen materials, substituted materials, insufficient materials, and incorrect materials. Based on graph structure feature vectors, the feeding status is determined by fusing features from personnel actions, material status, and equipment parameters. The core of the judgment is to match and verify the three types of integrated features with the process requirements one by one. If all of them match, the feeding is normal. If one or more types of features conflict with the process requirements and form a chain of abnormal evidence that corroborates each other, the feeding is judged to be an abnormal state of the corresponding type.
[0056] The verification priority and correlation of the three types of integrated features are as follows: personnel action features are the behavioral basis, material status features are the core basis, and equipment parameter features are numerical evidence. All three are indispensable to avoid misjudgment based on a single feature.
[0057] The feeding status labels include: normal feeding status (y=+1) and abnormal feeding status (y=-1). The visual recognition algorithms include: YOLOv5 object detection model and MediaPipe key point extraction model.
[0058] By combining the graph structure feature vector of the spatiotemporal graph with the corresponding feeding state label y and variability label v, a dual-label dataset specifically for theft prevention of precious metal feeding is obtained.
[0059] In some embodiments of the S500, a dual-label dataset is input into the trained multi-task fusion model.
[0060] The multi-task fusion model outputs predicted values of the feeding status label y and the variability label v. Based on the predicted values, the corresponding extension domain is determined, and the corresponding disposal strategy is executed.
[0061] Among them, reference Figure 2 Based on the four-quadrant extension domain classification and combined with the security level requirements for precious metal feeding, the dual-label prediction values output by the model are divided into four extension domains: negative quantitative variable domain, negative qualitative variable domain, positive qualitative variable domain, and positive quantitative variable domain. Differentiated hierarchical anti-theft strategies are implemented for each extension domain.
[0062] In one embodiment, based on the predicted values of the feeding status label and the variability label, if the feeding status label is determined to be in a normal feeding state (y=+1) and the variability label is determined to be in a low variability state (v=-1), then the corresponding extension domain is determined to be a positive variable domain. The feeding status characteristics are normal feeding, stable state, and no abnormal trend.
[0063] In one embodiment, based on the predicted values of the feeding status label and the variability label, if the feeding status label is determined to be in a normal feeding state (y=+1) and the variability label is determined to be in a high variability state (v=+1), then the corresponding extension domain is determined to be a positive qualitative change domain. The feeding status characteristics are normal feeding, unstable state, and prone to transformation into an abnormal state.
[0064] In one embodiment, based on the predicted values of the feeding status label and the variability label, if the feeding status label is determined to be an abnormal feeding state (y=-1) and the variability label is determined to be a low variability label (v=-1), then the corresponding extension domain is determined to be a negative variable domain. The feeding status characteristics are abnormal feeding, stable state, and clear characteristics (such as clear theft / replacement).
[0065] In one embodiment, based on the predicted values of the feeding status label and the variability label, if the feeding status label is determined to be an abnormal feeding state (y=-1) and the variability label is determined to be a high variability label (v=+1), then the corresponding extension domain is determined to be a negative prime variable domain. The feeding status characteristics are abnormal feeding, unstable state, and suspected abnormality undetermined.
[0066] In some embodiments of S600, the corresponding hierarchical anti-theft strategy is determined by the corresponding extended domain.
[0067] After implementing the anti-theft strategy, visual data of material feeding is continuously collected for 5 seconds to analyze the changing trend of the material feeding status: if the trend changes to normal, the multi-task fusion model is fine-tuned within the variable security threshold; if the trend changes to abnormal, the sample is marked as a high-risk sample and added to the dual-label dataset; if the status does not change, the original parameters are retained for continuous monitoring. Record staff intervention operations (such as false alarm cancellation, missed alarm supplementation, manual alarm, process update), associate operation details with corresponding material feeding visual data as samples to be corrected, and correct the dual-label dataset according to preset rules (such as correcting v to -1 for false alarm positive quality change samples). The corrected high-confidence samples are added to the dual-label dataset. The multi-task fusion model is retrained monthly, and the parameters of the old model are overwritten with the new dataset. The model is then redeployed to the edge to continuously improve its generalization. At the same time, the judgment criteria of the graded anti-theft strategy are optimized according to the process update, and the new material feeding process requirements are adapted to achieve iterative optimization of the multi-task fusion model.
[0068] In some embodiments of this invention, in step S100, the process of obtaining the structured feature vector specifically includes the following steps: S110 uses the established target detection model to detect the feeder, materials and equipment in the feeding area as detection targets, and outputs a set of candidate areas for each detection target.
[0069] S120, based on the candidate regions of each detection target, uses the established extraction model and dataset to identify the key points of the core actions of the feeding behavior, outputs the visual recognition data of each key point, and extracts the visual features of the material to obtain visual material data.
[0070] S130: Acquire the equipment's working data, align the visual recognition data and visual material data with the working data according to the timestamp, and form the feeding visual data by aligning the aligned visual recognition data, visual material data and working data.
[0071] S140, perform data cleaning on the visual data of material feeding, and construct a material feeding coordinate system with the center of the material feeding port in the material feeding area as the absolute reference origin; wherein, the visual data of material feeding includes visual recognition data, visual material data and working data.
[0072] S150 normalizes and converts the coordinates of the visual recognition data into relative coordinates in the feeding coordinate system, normalizes the visual material data and working data numerically, and eliminates interference factors.
[0073] S160 standardizes all cleaned and normalized data to obtain structured feature vectors.
[0074] In some embodiments of S110, based on the acquisition sensors installed in the feeding area, data is collected through the acquisition sensors, and the established target detection model is used to detect three core detection targets in the feeding area: the feeder, precious metal materials, and feeding equipment, and output a set of candidate areas for each target.
[0075] The target detection model is the YOLOv5 target detection model, but other detection models are not limited in this application. The feeding equipment includes: electronic scale, feeding port and material box.
[0076] Specifically, there are three sensors. The first sensor is located in the material handling area, covering the material bin and the starting section from the material handling area to the feeding area, ensuring that the action of leaving the material bin can be captured. The second sensor is located in the feeding area, covering the feeding port and the area within 1m in front of the feeding area, ensuring that the action of the hand approaching the feeding port can be captured. The third sensor has a 1-2m overlap in the field of view of the middle path from the material handling area to the feeding area, to prevent the hand from completely leaving the camera's field of view during movement.
[0077] A number of positioning markers (such as black and white blocks) are placed on the ground between the material bin and the feeding port. By identifying the positioning markers, it is possible to confirm whether the person is moving or has reached the feeding area, supplementing the key point detection of the camera and avoiding process loss due to obstruction during movement.
[0078] In some embodiments of S120, based on the candidate region, using the established extraction model, 21 key points of the hands, 12 key points of the upper limbs and torso of the person feeding the material are cropped and extracted, the normalized coordinates and visibility information of each key point are output, and the confidence of the visibility information is extracted to form visual recognition data.
[0079] Simultaneously, visual features of precious metal materials are extracted to obtain visual material data.
[0080] The visual material data includes: material type, material quantity, and location characteristics. Visual material data may also include other data, which are not limited in this application. The extraction model can be a MediaPipe pose or MediaPipehand keypoint extraction model.
[0081] In some embodiments of S130, the working data of the feeding device is acquired, and the visual recognition data, visual material data and working data are aligned with the timestamp and stored in the edge database to form feeding visual data. The acquisition frequency is set to 200ms / time to balance real-time performance and computational load.
[0082] The operating data includes the weight of the electronic scale and the status of the feeding port. Other data may also be included, which are not limited in this application.
[0083] Specifically, the steps for determining that the feeding behavior is correct are as follows: During the material handling stage, the material handling area is detected using the established target detection model. Material handling is considered complete only if all three of the following conditions are met simultaneously: 1) The established extraction model confirms that the key points of both wrists have entered the calibrated area of the material bin, with a confidence level > 0.6; 2) The established target detection model detects the target precious metal in the area surrounding the hands, with a confidence level > 0.7; 3) The exclusion condition is that the established extraction model detects the current material as not being the target precious metal, indicating that the wrong material was handled. If the above conditions are eliminated, the process proceeds to the next step; otherwise, the process is prohibited from proceeding to the next step.
[0084] During the movement phase, to verify whether the material is in the hands of the feeder, the coordinates of the key points of the hand in the set extraction model are used to dynamically select a "50px area around the hand" as the target detection area. The set target detection model detects this area every frame. If the target precious metal is not detected for several consecutive frames, a "material loss" warning is triggered and the process is paused. The judgment of a brief occlusion is as follows: based on the material position and hand trajectory of the previous frame, the "material in hand" state is temporarily stored, and the on-site occlusion warning light is lit to remind the user. The maximum waiting time is 3 seconds. After the material is visible again, the verification is repeated.
[0085] During the feeding stage, the established target detection model and extraction model are used for detection. Feeding is considered complete only if all four of the following conditions are met simultaneously: The extraction model confirms that the fingertip key point has entered the feed inlet calibration area with a confidence level > 0.5; the target detection model confirms that the target precious metal is detected in the feed inlet area with a confidence level > 0.6; and the target detection model confirms that the IOU between the target precious metal detection frame and the feed inlet detection frame is > 0.3. This confirms that the material has entered the feed inlet, not just approached it. The condition that only a hand is detected in the feed inlet area without detecting the precious metal, indicating empty feeding, is excluded, triggering an early warning.
[0086] After all three stages are completed, the current feeding behavior is considered to be correct. The collected and processed visual recognition data, visual material data and working data are aligned with the timestamp and stored in the edge database to form feeding visual data.
[0087] In some embodiments of S140, outliers are removed using the 3σ principle and missing values are filled using linear interpolation to achieve data cleaning.
[0088] Outliers can be the instantaneous number of jumps in the electronic scale and / or key point detection errors. Missing values can be due to the loss of key points caused by temporary occlusion or the loss of parameters caused by temporary sensor disconnection. This application does not impose any restrictions on the types of the above values.
[0089] A three-dimensional feeding coordinate system is constructed with the center of the feeding port as the absolute reference origin.
[0090] In some embodiments of S150, based on visual recognition data, the normalized coordinates of the 2D screen of each core key point of the feeder are converted into relative coordinates under the feeding coordinate system to achieve spatial normalization.
[0091] Specifically, the conversion formula is: x_rel=x_screen-x0, y_rel=y_screen-y0, z_rel=distance sensor acquisition value-z0, where x0, y0, and z0 are the coordinate values of the center of the feeding port in the feeding coordinate system.
[0092] Based on visual material data and work data, the material type, quantity, location characteristics, electronic scale weight, and feeding port status are numerically normalized. The min-max normalization formula is used to convert them into normalized values in the range of [0,1], eliminating spatial interference factors such as camera installation location, feeding personnel height, and material specifications, and ensuring the consistency and comparability of features.
[0093] Specifically, the normalization formula is: x_norm=(x-x_min) / (x_max-x_min).
[0094] In some embodiments of S160, all cleaned and normalized feature data are uniformly standardized to form structured feature vectors.
[0095] In some embodiments of this invention, in step S200, the process of establishing the association between the temporal edge and the spatial edge includes the following steps: S210: Extract the structured feature vectors of the set number of consecutive time series to form the time series segment of feeding behavior. Take the feeder, material and equipment as spatiotemporal graph nodes. Each of the three spatiotemporal graph nodes has corresponding node features. S220, based on time-series segments, establishes the physiological connection relationship of key points of the feeder according to the node characteristics of the spatiotemporal graph nodes of the feeder within the same frame, and establishes the spatial association relationship of feeder-material-equipment according to the node characteristics of the three spatiotemporal graph nodes to form a spatial edge; S230, based on temporal segments, obtains temporal edges within adjacent frames based on the motion trajectories of the same node features.
[0096] In some embodiments of S210, a set number of structured feature vectors are extracted according to the time series to form a time sequence segment of feeding behavior. The feeder, material and equipment are used as nodes in the spatiotemporal graph. Each node has corresponding node features, and the node features carry normalized coordinates and visibility information.
[0097] Specifically, the node features of the spatiotemporal graph nodes of the feeder are as follows: 12 core key points of the upper limbs and torso defined in S120 are reused, and 21 key points of the hands defined in S120 are reused. The node features include: the relative coordinates in the feeding coordinate system in S150 and the visibility information and / or confidence of each key point in S120.
[0098] The node features of the spatiotemporal graph nodes of the materials are: the visual material data of material type, material quantity and location features extracted from S120 are reused. The node features include: the [0,1] interval values in S150 after numerical normalization, that is, the visual material data after numerical normalization.
[0099] The node characteristics of the spatiotemporal diagram nodes of the equipment are: reuse the working data of electronic scale weight and feeding port switch status collected in S130. The node characteristics include: the numerical values in S160 after numerical normalization and standardization, that is, the working data after numerical normalization and standardization, to eliminate the equipment range difference.
[0100] In some embodiments of S220, the edge definition of the fused spatiotemporal graph is based on the spatial associations of S110 to S130 and the spatial references of S140 to S160.
[0101] Based on temporal segments, node features within the same frame are extracted. For the spatiotemporal graph nodes of the feeder, physiological connections (shoulder-elbow-wrist) are established for the feeder's key points. For the three spatiotemporal graph nodes of the feeder, material, and equipment, spatial relationships between the feeder, material, and equipment are established. Spatial edges are obtained through physiological and spatial connections.
[0102] Specifically, based on the spatial relationship between the feeder, materials, and equipment collected from S110 to S130 (such as whether the hand is in contact with the material or whether the material is directly above the feed port), and with the spatial distance calculated using the three-dimensional coordinate system with the feed port center of S140 as the origin, the spatial correlation characteristics under different camera perspectives and different feeder heights are comparable.
[0103] In some embodiments of S230, based on time segments, the time edge is the motion trajectory of the same node in adjacent frames.
[0104] The motion trajectory can be the movement trajectory of the hand from the electronic scale to the feeding port and / or the change trajectory of the material weight.
[0105] Specifically, based on the timestamp alignment of S130, the same node in adjacent frames constructs motion trajectories according to the timestamp order, and the changes in trajectory coordinates are relative coordinates after spatial normalization of S150, eliminating spatial interference of absolute coordinates.
[0106] The final output of the fused spatiotemporal graph is a graph structure feature vector with node features and edge weights. Its dimension and numerical range are determined by the structured feature vector in S100, ensuring that it can be directly input into S400 for dual labeling without additional format conversion.
[0107] In some embodiments of this invention, in step S400, the process of generating the dual-label dataset includes the following steps: S410. Based on the graph structure feature vector, calculate the three-dimensional scores of posterior uncertainty, boundary proximity and sparsity for each sample. Use the set clustering algorithm to divide the three-dimensional scores into high-score clusters and low-score clusters, and assign labels to the high-score clusters and low-score clusters to obtain variable labels.
[0108] S420: Based on the graph structure feature vector, the set visual recognition algorithm is used to identify the abnormal feeding state and the normal feeding state, and the labels are assigned to the abnormal feeding state and the normal feeding state to obtain the feeding state label.
[0109] S430 combines the graph structure feature vector, the corresponding feeding status label, and the variability label to obtain a dual-label dataset.
[0110] In some embodiments of S410, the variable tag v differs from traditional technical solutions that only determine normal or abnormal conditions. The essence of variable tag determination is to quantify the variability of the material feeding state through algorithms, predicting whether the state will transform into an abnormal state, and providing a core basis for tiered anti-theft strategies. This is achieved through graph structure feature vectors.
[0111] Addressing the pain point that existing precious metal feeding anti-theft technologies can only passively detect anomalies and cannot predict potential risks in advance, this paper uses variable tags to determine the stability of the feeding status: for stable normal states (low variable tags), routine monitoring and handling are performed; for abnormal states (high variable tags), direct handling is performed; for unstable states (high variable, such as normal feeding but deviation from the standard, suspected anomalies that have not yet been determined), early warning and intervention are provided, realizing an upgrade from "passive alarm" to "proactive prediction".
[0112] Based on graph structure feature vectors, the graph structure feature vectors contain three types of normalized / standardized core features: Personnel action features: the three-dimensional relative coordinates of key points of the feeder's upper limbs, torso, and hands, as well as visual recognition data of detection confidence and / or visibility information. Material state features: visual material data of the precious metal material's type, quantity, and location characteristics. Equipment parameter features: operational data of the electronic scale's weight and the feed port's open / closed status.
[0113] The label variability algorithm is used to calculate the three-dimensional scores of posterior uncertainty, boundary proximity and sparsity for each sample, which characterize the degree of variability of the state from different perspectives. The higher the score, the stronger the variability.
[0114] The posterior uncertainty U characterizes the fuzziness of the model's determination of the feeding state.
[0115] Bayesian posterior probability:
[0116] Where: P(x|y=1): the class conditional density of each graph structure feature vector sample calculated using the Gaussian mixture model, P(y=1) is the prior probability, P(x)=P(x|y=1)P(y=1)+P(x|y=0)P(y=0), the closer the posterior probability is to 0.5, the more uncertain the model is.
[0117] Score for posterior uncertainty:
[0118] When P(y=1|x)=0.5, then U=1 (least uncertain); when P(y=1|x)=0 or 1, then U=0 (most certain). The posterior uncertainty U characterizes the ambiguity of the model's determination of the feeding state. x can be a sample, i.e., the graph structure feature vector of the current feeding behavior.
[0119] Boundary proximity B: Characterizes the distance between the feeding state characteristics and the normal-abnormal judgment boundary. The higher the value, the more likely the state is in the boundary area and it is very easy to transform into another state.
[0120] Calculate the log-class conditional density of two Gaussian mixture models (GMMs):
[0121]
[0122] Calculate the absolute difference:
[0123] Boundary proximity score:
[0124] When ΔlogP approaches 0, the boundary proximity B approaches 1; when ΔlogP is at its maximum, the boundary proximity B approaches 0.
[0125] Sparsity S: Characterizes the frequency of occurrence of this feeding state in actual industrial scenarios. The higher the value, the rarer the scenario, the more irregular the state characteristics, and the poor the stability.
[0126] Using the total joint probability density
[0127] Take the logarithm:
[0128] The lower the logP(x) value, the more abnormal or sparse the sample is; Sparsity score: Sort logP(x) to convert it into a sparsity ranking:
[0129] N is the total number of samples, and rank() is the ascending ranking (rank=1 has the lowest density).
[0130] S tends towards 1: extremely sparse, highly variable; S tends towards 0: dense region, low variability.
[0131] The fitted 3D scores are classified using the designed clustering algorithm into high-score and low-score clusters. The average weighted composite score of each sample from the two clusters is compared. All samples in the cluster with the lower average weighted composite score are labeled with low variability (v=-1), while all samples in the cluster with the higher average weighted composite score are labeled with high variability (v=+1). This completes the construction of the dual-label dataset. In one embodiment, a score threshold can be set to differentiate between high and low scores.
[0132] Equal weighted comprehensive score
[0133] Average weighted composite score =
[0134] Based on the clustering results, samples are labeled and assigned unique and quantifiable rules. n represents the total number of samples; i represents the current sample and the current feeding action.
[0135] High variability label (v=+1): The sample is clustered into a high-score cluster, which indicates that the feeding state is unstable and there is a high risk of transformation into anomalies; Low variability label (v=-1): The samples are clustered into low-score clusters, which clearly represent stable feeding status, normal / abnormal characteristics, and no trend changes.
[0136] The variable labels include high-variability labels and low-variability labels. The clustering algorithms used include Gaussian Mixture Model (GMM) and K-means clustering. The GMM model is iterated 50 times, and the K-means clustering uses the elbow rule to determine the number of clusters to be 2.
[0137] In some embodiments of S420, based on the requirements of the precious metal feeding process, the feeding status is identified into normal feeding status (y=+1) and abnormal feeding status (y=-1) using the established visual recognition algorithm. The abnormal feeding status is further subdivided into four categories: stolen materials, substituted materials, insufficient materials, and incorrect materials. Based on graph structure feature vectors, the feeding status is identified and determined according to the fusion features of personnel actions, material status, and equipment parameters.
[0138] The core of the judgment is to match and verify the three types of integrated features with the process requirements one by one. If all of them match, the feeding is normal. If one or more types of features conflict with the process requirements and form a chain of abnormal evidence that corroborates each other, the feeding is judged to be an abnormal state of the corresponding type.
[0139] The verification priority and correlation of the three types of integrated features are as follows: personnel action features are the behavioral basis, material status features are the core basis, and equipment parameter features are numerical evidence. All three are indispensable to avoid misjudgment based on a single feature.
[0140] Specifically, the confidence level setting for each target identification should be based on the actual production data, such as the pixel count of the acquisition sensor and ambient lighting, and the confidence threshold should be set according to the test results.
[0141] Material matching requirements: The type and quantity of precious metal materials fed must be completely consistent with the production work order. It is forbidden to feed the wrong precious metal materials or replace them with non-target precious metal materials. Compliance requirements for quantity values: The weight of materials fed must accurately match the value approved in the production work order (the error must be within the allowable range of the process). It is forbidden to feed too little or too much material. All materials must be fed into the feeding port. It is forbidden to intercept or carry away materials during the process. Operational guidelines require that the entire feeding process must follow the standard procedure of "picking up (material bin) - moving (designated path) - feeding (feeding port)". Hands / materials must always be within the monitoring field of vision. Non-standard actions such as hands deviating from the designated path or materials leaving the hands and not entering the feeding port are prohibited.
[0142] In other words, the MediaPipe extraction model and YOLOv5 detection model are used to detect the entire feeding process of personnel action features to ensure that the entire process is within the monitoring field of vision. The confidence level obtained from the personnel action features is compared with the set confidence threshold. If the set confidence threshold is met, the feeding is considered to be normal; otherwise, it is considered to be abnormal feeding.
[0143] Material matching requirements: The type and quantity of precious metal materials fed must be completely consistent with the production work order. It is forbidden to feed the wrong precious metal materials or replace them with non-target precious metal materials. In other words, the material type and quantity in the material status characteristics are matched with the data approved in the work order. If they match perfectly, the feeder can feed the material normally; otherwise, it is considered abnormal feeding.
[0144] Compliance requirements for quantity values: The weight of materials fed must accurately match the value approved in the production work order (the error must be within the allowable range of the process). It is forbidden to feed too little or too much material. All materials must be fed into the feeding port. It is forbidden to intercept or carry away materials during the process. In other words, the weight of the electronic scale in the equipment parameter characteristics is matched with the data verified by the work order. If they match perfectly, the feeder can feed materials normally; otherwise, it is considered abnormal feeding.
[0145] The determination of all material feeding status is based on whether it meets the above process requirements. If it does, it is marked as a normal feeding status. The execution of process requirements is verified by integrating features, thereby defining normal / abnormal and the type of abnormality.
[0146] Combining quantity compliance requirements and action standard requirements, the following abnormal evidence chain is used to determine the characteristics of the integrated human-material-equipment system. Based on the premise of completing legitimate material collection, all characteristics must be aligned with timestamps to form a continuous behavioral trajectory for verification: Personnel movements: MediaPipe extracted the model and detected that the key points of both wrists entered the material box calibration area, and the upper limb movements were in accordance with normal material handling specifications; Material status: The YOLOV5 detection model detected the target precious metal A around the hand, and the quantity and / or volume matched the work order verification value; Equipment parameters: The electronic scale detected a decrease in the weight of precious metal A in the bin, which was consistent with the work order's approved value, and there were no abnormal fluctuations.
[0147] Core: The abnormal fusion feature chain during the movement / feeding stage, which is the key basis for determining material theft.
[0148] In one embodiment, the process for determining the normal feeding status includes: When all three types of fused features—personnel actions, material status, and equipment parameters—match the process requirements, and there are no abnormal features throughout the process, forming a continuous and compliant behavior trajectory, it is determined to be a normal material feeding state. The core fused feature conditions are: Based on the human action features in the graph structure feature vector, it is determined that the entire process follows the specified path, the key point trajectory is standardized, and there are no anomalies such as occlusion, deviation, or non-standard convergence / extension. The confidence of key points at each stage meets the set process confidence threshold. Based on the material state characteristics in the graph structure feature vector, the type and quantity of materials picked up and fed are consistent with the work order, and the materials are kept around the hand / feeding port area throughout the process, with no abnormalities such as detachment, replacement, or interception. Based on the equipment parameter features in the graph structure feature vector, the decrease value of the material bin and the increase value of the feeding port are consistent with the work order verification value, the error is within the allowable range of the process, the sensor and electronic scale have no abnormal jumps, and the material type matches the result of the equipment identification module.
[0149] The feeding status labels include: normal feeding status (y=+1) and abnormal feeding status (y=-1). The visual recognition algorithms include: YOLOv5 object detection model and MediaPipe key point extraction model.
[0150] In one embodiment, the process for determining the abnormal feeding status of the replacement material includes: Based on the human action features in the graph structure feature vector, the YOLOv5 detection model was used in the material handling stage to detect that both target precious metal A and non-target precious metal B were present around the hand. Based on the material state features in the graph structure feature vector, the YOLOV5 detection model was used in the feeding stage to detect that only non-target precious metal B was detected in the feeding port area, and no target precious metal A was detected. Based on the equipment parameter features and material state features in the graph structure feature vector, the electronic scale detected that the weight at the feeding port was consistent with the work order, but the material type features conflicted with the results of the equipment material identification module.
[0151] If the above core integration characteristic conditions are met, then the abnormal feeding state is considered to be a replacement material.
[0152] In one embodiment, the process for determining the abnormal feeding status of insufficient material includes: Based on the human action features in the graph structure feature vector, the actions of the feeder conform to the material picking-moving-feeding specification throughout the entire process, with no abnormalities such as deviation from the path or occlusion. Based on the material state features in the graph structure feature vector, the material quantity matches the work order during the material picking stage, and all materials are fed into the feeding port during the feeding stage. However, the YOLOV5 detection model detected that the material quantity picked was lower than the data set for the work order. Based on the equipment parameter features in the graph structure feature vector, the electronic scale detected that the decrease in the material bin and the increase in the material inlet were both lower than the data approved by the work order, and the error exceeded the allowable range of the process, and there was no approved record of insufficient material input.
[0153] If the above core integration characteristic conditions are met, the abnormal feeding state is considered to be insufficient material feeding.
[0154] In one embodiment, the process for determining the abnormal feeding status of incorrectly fed materials includes: Based on the human action features in the graph structure feature vector, it is determined that the human action is standardized throughout the entire process, without any anomalies such as obstruction, deviation from the path, or material entrainment. Based on the material state features in the graph structure feature vector, the YOLOv5 detection model was used to detect that the material being picked up and fed was non-target precious metal C, and there were no other materials. Based on the equipment parameter features and material status features in the graph structure feature vector, the electronic scale detected that the feeding weight was consistent with the work order, the material type feature was consistent with the result of the equipment material identification module, and there was no work order record for the non-target precious metal C in the production system, thus ruling out the possibility of asynchronous process adjustments.
[0155] If the above core integration characteristic conditions are met, the abnormal feeding status is considered to be incorrect material feeding.
[0156] For example, if materials are smuggled during the movement phase (e.g., materials are hidden in the palm / pocket), the abnormal state of material feeding is detected by MediaPipe based on the human action features in the graph structure feature vector. The model detects that the trajectory of the hand key points deviates from the designated path from the material picking area to the feeding area, and there is an offset action towards the body pocket / clothing or outside the feeding area, and the fingertip key points close (palm folds); the hand is covered by the body for a period of time exceeding the set short-term covering range and / or time, and when the hand key points reappear after the covering, the position has deviated from the original movement path.
[0157] Based on the material state features in the graph structure feature vector, the YOLOV5 detection module did not detect the target precious metal material in a 50px area around the dynamically selected hand for 3 consecutive frames or more; and no target precious metal material was detected in the feeding area, excluding process-permissible situations such as material falling or equipment obstruction.
[0158] Based on the device parameter features in the graph structure feature vector, if the electronic scale does not detect an increase in material weight at the feeding port and the level sensor does not send a material feeding signal, it can be considered that the material was stolen.
[0159] In some embodiments of S430, the graph structure feature vector of the fused spatiotemporal graph is combined with the corresponding feeding state label y and variability label v to obtain a dual-label dataset specifically for theft prevention of precious metal feeding.
[0160] Reference Figure 2 In some embodiments of this invention, in S500, the process of partitioning the extensible domain includes the following steps: S501 uses a dual-label dataset in the cloud to train the multi-task fusion model until the multi-task fusion model converges.
[0161] S510, based on the predicted value of the feeding status label, determine that the feeding status label is in normal feeding status, and based on the predicted value of the variability label, determine that the variability label is in low variability status. S520, based on the normal feeding status and low variability label, determines the corresponding extension domain as the positive variable domain from the four-quadrant extension domain, so as to determine the hierarchical anti-theft strategy set for the positive variable domain.
[0162] In some embodiments of S501, the structure of the multi-task fusion model is as follows: using an improved ST-GCN as a shared feature extraction layer, alternating stacking of spatial graph convolution and temporal convolution is performed on the fused spatiotemporal graph to extract global spatiotemporal fusion features of feeding behavior; after the ST-GCN output layer, two parallel prediction heads are set to share global spatiotemporal fusion features.
[0163] The parallel prediction heads are the feeding state prediction head and the variability prediction head.
[0164] Feeding status prediction head: Uses a Soft Max classifier to output the probability distribution of normal feeding status / abnormal feeding status, corresponding to the feeding status label y; Variability prediction head: Uses the Sigmoid function to output the probability of high / low variability, corresponding to the variability label v.
[0165] The weighted joint loss function is used, and the formula is: L=αLST GCN+βLBCEy+γLBCEv, where LST GCN is the spatiotemporal feature loss of ST-GCN, LBCEy and LBCEv are the binary cross-entropy losses of the feeding state label y and the variability label v, respectively, with α=β=γ=1, balancing the training priority of the two tasks.
[0166] The multi-task fusion model is trained in the cloud using a dual-label dataset until the model converges; the trained lightweight model is then deployed to an industrial edge computing box to achieve real-time inference, with inference time controlled within 100ms / frame.
[0167] In some embodiments of S510 and S520, the multi-task fusion model outputs the predicted values of the feeding status label y and the variability label v, and executes the corresponding hierarchical anti-theft strategy based on the extension domain corresponding to the predicted value.
[0168] Status label: Determines whether the current feeding behavior is a normal feeding status or an abnormal feeding status; it is a result-based determination. Variableness label: Determines whether the normal or abnormal feeding state is stable or whether there will be a trend change; this is a trend determination.
[0169] The predicted values of the dual tags are combined according to the pairwise pairing logic to form a four-quadrant extension domain, which provides a unique basis for subsequent hierarchical anti-theft strategies.
[0170] Based on a four-quadrant extension domain classification and considering the security level requirements for precious metal feeding, the model's output dual-label predicted values are divided into four extension domains: negative quantitative variable domain, negative qualitative variable domain, positive qualitative variable domain, and positive quantitative variable domain. Differentiated, tiered anti-theft strategies are implemented for each extension domain. Simultaneously, the model's automatic differentiation mechanism calculates the dominant variable factors (such as personnel hand movements, material weight, and / or electronic scale readings) to achieve precise intervention.
[0171] In one embodiment, based on the predicted values of the feeding status label and the variability label, if the feeding status label is determined to be in a normal feeding state (y=+1) and the variability label is determined to be in a low-variability state (v=-1), then the corresponding extension domain is determined to be a positive variable domain. The feeding status characteristics are normal feeding, stable state, and no abnormal trend. The corresponding hierarchical anti-theft strategy is: maintain monitoring: relax the detection threshold, reduce computational overhead, and retain normal feeding data. The strategy for dealing with dominant factors is: no intervention required, continuous data collection.
[0172] In one embodiment, based on the predicted values of the feeding status tag and the variability tag, if the feeding status tag is determined to be in a normal feeding state (y=+1) and the variability tag is determined to be in a highly variable state (v=+1), then the corresponding extension domain is determined to be a positive qualitative change domain. The feeding status characteristic is normal feeding, but the state is unstable and prone to transformation into anomalies. The corresponding hierarchical anti-theft strategy is: early warning and enhanced monitoring: local indicator lights flash, detection frequency is increased to 100ms / time, access to the feeding area is locked, and changes in the dominant factors are tracked. The strategy for dealing with the dominant factors is: focus on monitoring the dominant factors, and upgrade the strategy if the transformation into anomalies continues.
[0173] In one embodiment, based on the predicted values of the feeding status tag and the variability tag, if the feeding status tag is determined to be an abnormal feeding state (y=-1) and the variability tag is determined to be a low-variability tag (v=-1), then the corresponding extension domain is determined to be a negative variable domain. The feeding status characteristic is abnormal feeding, stable state, and a clearly defined abnormal feeding state. The corresponding hierarchical anti-theft strategy is: immediate alarm and forced handling: remotely push to the security terminal, lock the feeding port and / or hopper, and capture abnormal video to preserve evidence. The dominant factor response strategy is: freeze the relevant parameters of the dominant factor, preserve evidence, and coordinate with security personnel for on-site intervention.
[0174] In one embodiment, based on the predicted values of the feeding status tag and the variability tag, if the feeding status tag is determined to be an abnormal feeding state (y=-1) and the variability tag is determined to be a high variability tag (v=+1), then the corresponding extension domain is determined to be a negative prime variable domain. The feeding status characteristics are abnormal feeding, unstable state, and suspected abnormality undetermined. The corresponding hierarchical anti-theft strategy is: trend tracking and minor intervention: local silent alarm security terminal prompts, on-site voice reminders for standardized feeding, suspension of the feeding process, and continuous monitoring of dominant factors. The dominant factor response strategy is: guiding personnel to correct the related behaviors of the dominant factor; if the trend does not reverse within 10 seconds, it is escalated to mandatory handling.
[0175] In some embodiments of this invention, in step S600, the process of determining and iteratively optimizing the hierarchical anti-theft strategy includes the following steps: S610 determines the corresponding hierarchical anti-theft strategy through the corresponding extended domain.
[0176] S620 executes the corresponding set hierarchical anti-theft strategy, obtains the material feeding visual data within the set time period, analyzes the changing trend based on the material feeding visual data, adjusts the dual-label dataset based on the changing trend, updates the multi-task fusion model, and optimizes the set hierarchical anti-theft strategy.
[0177] In some embodiments of S610, a differentiated hierarchical anti-theft strategy is implemented for each extensible domain.
[0178] The extension domain includes: negative quantitative domain, negative qualitative domain, positive qualitative domain, and positive quantitative domain.
[0179] In one embodiment, when the extension domain is a positive variable domain, the feeding status is characterized by normal feeding, a stable state, and no abnormal trends. The corresponding hierarchical anti-theft strategy is: maintain monitoring: relax the detection threshold, reduce computational overhead, and retain normal feeding data. The strategy for dealing with dominant factors is: no intervention required, continuous data collection.
[0180] In one embodiment, when the extensible domain is a positive qualitative change domain, the feeding state is characterized as normal feeding, but the state is unstable and prone to transformation into anomalies. The corresponding hierarchical anti-theft strategy is: early warning and enhanced monitoring: local indicator lights flash, detection frequency is increased to 100ms / time, access to the feeding area is locked, and changes in the dominant factors are tracked. The strategy for dealing with the dominant factors is: focus on monitoring the dominant factors, and upgrade the strategy if the transformation into anomalies continues.
[0181] In one embodiment, when the extensible domain is a negative variable domain, the feeding status is characterized by abnormal feeding, stable status, and clearly defined abnormal feeding state. The corresponding hierarchical anti-theft strategy is: immediate alarm and forced handling: remotely push to the security terminal, lock the feeding port and / or hopper, and capture abnormal video to preserve evidence. The dominant factor response strategy is: freeze the relevant parameters of the dominant factor, preserve evidence, and coordinate with security personnel for on-site intervention.
[0182] In one embodiment, when the extensible domain is a negative qualitative change domain, the feeding status is characterized by abnormal feeding, unstable state, and suspected abnormality that is not yet defined. The corresponding tiered anti-theft strategy is: trend tracking and minor intervention: local silent alarm security terminal prompts, on-site voice reminders for standardized feeding, suspension of the feeding process, and continuous monitoring of the dominant factors. The strategy for dealing with the dominant factors is: guiding personnel to correct the behaviors related to the dominant factors; if the trend does not reverse within 10 seconds, it is escalated to mandatory action.
[0183] In some embodiments of S620, the corresponding hierarchical anti-theft strategy is determined by the corresponding extended domain.
[0184] After implementing the anti-theft strategy, visual data of material feeding is continuously collected for 5 seconds to analyze the changing trend of the material feeding status: if the trend changes to normal, the multi-task fusion model is fine-tuned within the variable security threshold; if the trend changes to abnormal, the sample is marked as a high-risk sample and added to the dual-label dataset; if the status does not change, the original parameters are retained for continuous monitoring. Record staff intervention operations (such as false alarm cancellation, missed alarm supplementation, manual alarm, process update), associate operation details with corresponding material feeding visual data as samples to be corrected, and correct the dual-label dataset according to preset rules (such as correcting v to -1 for false alarm positive quality change samples). The corrected high-confidence samples are added to the dual-label dataset. The multi-task fusion model is retrained monthly, and the parameters of the old model are overwritten with the new dataset. The model is then redeployed to the edge to continuously improve its generalization. At the same time, the judgment criteria of the graded anti-theft strategy are optimized according to the process update, and the new material feeding process requirements are adapted to achieve iterative optimization of the multi-task fusion model.
[0185] Using S610 and S620, based on dual-label extended-domain four-quadrant classification, differentiated hierarchical anti-theft strategies are implemented. Simultaneously, variable dominant factors are calculated for precise intervention, linking audible and visual alarms, security terminals, and material feeding equipment to complete hierarchical handling. A dual closed loop of automatic algorithm feedback and manual operation feedback is achieved, correcting dual labels, supplementing with high-confidence samples, and periodically iterating and optimizing the model. The model can adapt to updates in material feeding processes, new abnormal material feeding methods, and adjustments to camera / sensor installation positions without requiring the re-collection of large amounts of data for model training, significantly reducing industrial deployment and maintenance costs.
[0186] Please see Figure 3 This application also provides a vision-based precious metal dispensing anti-theft system, which can implement the above-mentioned vision-based precious metal dispensing anti-theft method. The system includes: The data processing module is used to acquire visual data of feeding behavior and construct a feeding coordinate system. Based on the feeding coordinate system, the visual data of feeding is normalized to obtain a structured feature vector.
[0187] The association establishment module is used to define feeders, materials, and equipment as spatiotemporal graph nodes based on the extracted structured feature vectors, and to establish temporal and spatial edges based on the spatiotemporal graph nodes.
[0188] The fusion spatiotemporal graph construction module is used to construct a fusion spatiotemporal graph based on spatiotemporal graph nodes, temporal edges, and spatial edges. Through the fusion spatiotemporal graph, the graph structure feature vector is output.
[0189] The dual-label building module is used to generate a dual-label dataset with feeding status labels and variability labels based on graph structure feature vectors.
[0190] The extension domain decision module is used to input the dual-label dataset into the trained multi-task fusion model, output the predicted values of the feeding state label and the variability label, and determine the corresponding extension domain based on the predicted values. The execution feedback module is used to determine the corresponding hierarchical anti-theft strategy based on the corresponding extension domain, so as to iteratively optimize the multi-task fusion model.
[0191] It is understood that the content of the above method embodiments is applicable to this system embodiment. The specific functions implemented in this system embodiment are the same as those in the above method embodiments, and the beneficial effects achieved are also the same as those achieved in the above method embodiments.
[0192] This application also provides an electronic device, which includes a memory and a processor. The memory stores a computer program, and the processor executes the computer program to implement the aforementioned visual recognition-based precious metal anti-theft method. This electronic device can be any smart terminal, including tablet computers, in-vehicle computers, etc.
[0193] It is understood that the content of the above method embodiments is applicable to this device embodiment. The specific functions implemented by this device embodiment are the same as those of the above method embodiments, and the beneficial effects achieved are also the same as those achieved by the above method embodiments.
[0194] Please see Figure 4 , Figure 4 The hardware structure of an electronic device according to another embodiment is illustrated. The electronic device includes: The processor 901 can be implemented using a general-purpose CPU (Central Processing Unit), microprocessor, application-specific integrated circuit (ASIC), or one or more integrated circuits, and is used to execute relevant programs to implement the technical solutions provided in the embodiments of this application. The memory 902 can be implemented as a read-only memory (ROM), static storage device, dynamic storage device, or random access memory (RAM). The memory 902 can store the operating system and other application programs. When the technical solutions provided in the embodiments of this specification are implemented through software or firmware, the relevant program code is stored in the memory 902 and is called and executed by the processor 901 to implement the vision recognition-based precious metal anti-theft method of this application embodiment. The input / output interface 903 is used to implement information input and output; The communication interface 904 is used to enable communication and interaction between this device and other devices. Communication can be achieved through wired means (such as USB, Ethernet cable, etc.) or wireless means (such as mobile network, WIFI, Bluetooth, etc.). Bus 905 transmits information between various components of the device (e.g., processor 901, memory 902, input / output interface 903, and communication interface 904); The processor 901, memory 902, input / output interface 903, and communication interface 904 are connected to each other within the device via bus 905.
[0195] This application also provides a computer-readable storage medium storing a computer program that, when executed by a processor, implements the above-described visual recognition-based method for preventing theft of precious metals.
[0196] It is understood that the content of the above method embodiments is applicable to this storage medium embodiment. The specific functions implemented in this storage medium embodiment are the same as those in the above method embodiments, and the beneficial effects achieved are also the same as those achieved in the above method embodiments.
[0197] Memory, as a non-transitory computer-readable storage medium, can be used to store non-transitory software programs and non-transitory computer-executable programs. Furthermore, memory may include high-speed random access memory, and may also include non-transitory memory, such as at least one disk storage device, flash memory device, or other non-transitory solid-state storage device. In some embodiments, memory may optionally include memory remotely located relative to the processor, and these remote memories can be connected to the processor via a network. Examples of such networks include, but are not limited to, the Internet, intranets, local area networks, mobile communication networks, and combinations thereof.
[0198] The embodiments described in this application are for the purpose of more clearly illustrating the technical solutions of the embodiments of this application, and do not constitute a limitation on the technical solutions provided by the embodiments of this application. As those skilled in the art will know, with the evolution of technology and the emergence of new application scenarios, the technical solutions provided by the embodiments of this application are also applicable to similar technical problems.
[0199] Those skilled in the art will understand that the technical solutions shown in the figures do not constitute a limitation on the embodiments of this application, and may include more or fewer steps than shown, or combine certain steps, or different steps.
[0200] Those skilled in the art will understand that all or some of the steps in the methods disclosed above, as well as the functional modules / units in the systems and devices, can be implemented as software, firmware, hardware, or suitable combinations thereof.
[0201] Furthermore, the terms “comprising” and “having”, and any variations thereof, are intended to cover non-exclusive inclusion, such that a process, method, system, product, or apparatus that includes a series of steps or units is not necessarily limited to those steps or units that are explicitly listed, but may include other steps or units that are not explicitly listed or that are inherent to such process, method, product, or apparatus.
[0202] It should be understood that in this application, "at least one (item)" means one or more, and "more than" means two or more. "And / or" is used to describe the relationship between related objects, indicating that three relationships can exist. For example, "A and / or B" can represent three cases: only A exists, only B exists, and both A and B exist simultaneously, where A and B can be singular or plural. The character " / " generally indicates that the preceding and following related objects are in an "or" relationship. "At least one (item) of the following" or similar expressions refer to any combination of these items, including any combination of single or plural items. For example, at least one (item) of a, b, or c can represent: a, b, c, "a and b", "a and c", "b and c", or "a and b and c", where a, b, and c can be single or multiple.
[0203] The preferred embodiments of the present application have been described above with reference to the accompanying drawings, but this does not limit the scope of the claims of the present application. Any modifications, equivalent substitutions, and improvements made by those skilled in the art without departing from the scope and substance of the embodiments of the present application shall be within the scope of the claims of the present application.
Claims
1. A method for preventing theft of precious metals based on visual recognition, characterized in that, The method includes: Obtain visual data of feeding behavior and construct a feeding coordinate system. Based on the feeding coordinate system, normalize the visual data of feeding to obtain a structured feature vector. Based on the extracted structured feature vectors, the feeder, materials, and equipment are defined as spatiotemporal graph nodes, and time edges and spatial edges are established based on the spatiotemporal graph nodes. Based on the spatiotemporal graph nodes, the temporal edges, and the spatial edges, a fused spatiotemporal graph is constructed, and a graph structure feature vector is output through the fused spatiotemporal graph; Based on the graph structure feature vector, a dual-label dataset with feeding status label and variability label is generated; The dual-label dataset is input into the trained multi-task fusion model, which outputs the predicted values of the feeding status label and the variability label. Based on the predicted values, the corresponding extension domain is determined. Based on the corresponding extended domain, the corresponding hierarchical anti-theft strategy is determined to iteratively optimize the multi-task fusion model.
2. The method according to claim 1, characterized in that, The visual data for acquiring the feeding behavior includes: Using the established target detection model, the feeder, materials and equipment in the feeding area are detected as detection targets, and the data set of candidate areas for each detection target is output. Based on the candidate regions of each detection target, the core action key points of the feeding behavior are identified using the established extraction model and the data set, the visual recognition data of each key point is output, and the visual features of the material are extracted to obtain visual material data. The equipment's working data is acquired, and the visual recognition data and the visual material data are aligned with the working data according to the timestamp. The aligned visual recognition data, the visual material data, and the working data form the feeding visual data.
3. The method according to claim 1, characterized in that, The normalization process of the material feeding visual data to obtain a structured feature vector includes: The feeding visual data is cleaned, and the feeding coordinate system is constructed with the center of the feeding port in the feeding area as the absolute reference origin; wherein, the feeding visual data includes visual recognition data, visual material data and working data; The coordinates of the visual recognition data are normalized and converted into relative coordinates in the feeding coordinate system. The visual material data and the working data are numerically normalized and interference factors are eliminated. All cleaned and normalized data are standardized to obtain the structured feature vector.
4. The method according to claim 1, characterized in that, The step of establishing temporal and spatial edges based on the spatiotemporal graph nodes includes: Extract the structured feature vectors of the set number of consecutive time series to form a time series segment of feeding behavior, and take the feeder, material and equipment as nodes of the spatiotemporal graph. Each of the three spatiotemporal graph nodes has corresponding node features. Based on the time sequence segment, within the same frame, according to the node characteristics of the spatiotemporal graph nodes of the feeder, the physiological connection relationship of the feeder's key points is established, and according to the node characteristics of the three spatiotemporal graph nodes, the spatial association relationship between the feeder, material, and equipment is established to form the spatial edge; Based on the time sequence segment, the time edge is obtained within adjacent frames according to the motion trajectory of the same node feature.
5. The method according to claim 4, characterized in that, The material feeding visual data includes visual recognition data, visual material data, and working data. The three spatiotemporal graph nodes store corresponding node features, including: The visual recognition data and the relative coordinates obtained by normalizing the visual recognition data in the feeding coordinate system are used as the node features of the feeder; The visual material data is numerically normalized as the node feature of the material, and the working data that has undergone numerical normalization and standardization processing is used as the node feature of the device.
6. The method according to claim 1, characterized in that, The dual-label dataset that generates feeding status labels and variability labels includes: Based on the graph structure feature vector, calculate the three-dimensional scores of posterior uncertainty, boundary proximity, and sparsity for each sample. Use the set clustering algorithm to divide the three-dimensional scores into high-score clusters and low-score clusters, and assign labels to the high-score clusters and the low-score clusters to obtain the variability labels. Based on the graph structure feature vector, the set visual recognition algorithm is used to identify the abnormal feeding state and the normal feeding state, and the labels are assigned to the abnormal feeding state and the normal feeding state to obtain the feeding state label; The graph structure feature vector, the corresponding feeding status label, and the variability label are combined to obtain the dual-label dataset.
7. The method according to claim 1, characterized in that, The step of determining the corresponding extension domain based on the predicted value includes: Based on the predicted value of the feeding status label, the feeding status label is determined to be in a normal feeding state, and based on the predicted value of the variability label, the variability label is determined to be a low variability label. Based on the normal feeding state and the low variability label, the corresponding extension domain is determined to be a positive variable domain from the four quadrant extension domain, so as to determine the hierarchical anti-theft strategy set for the positive variable domain. The four-quadrant extension domain includes the negative quantitative domain, the negative qualitative domain, the positive qualitative domain, and the positive quantitative domain.
8. A visual recognition-based precious metal dispensing anti-theft system, characterized in that, The system includes: The data processing module is used to acquire visual data of feeding behavior, construct a feeding coordinate system, and perform normalization processing on the feeding visual data based on the feeding coordinate system to obtain a structured feature vector. The association establishment module is used to define the feeder, material and equipment as spatiotemporal graph nodes based on the extracted structured feature vector, and to establish temporal edges and spatial edges based on the spatiotemporal graph nodes; The fusion spatiotemporal graph construction module is used to construct a fusion spatiotemporal graph based on the spatiotemporal graph nodes, the time edges, and the spatial edges, and output graph structure feature vectors through the fusion spatiotemporal graph; A dual-label construction module is used to generate a dual-label dataset with feeding status labels and variability labels based on the graph structure feature vector; The extension domain decision module is used to input the dual-label dataset into the trained multi-task fusion model, output the predicted values of the feeding state label and the variability label, and determine the corresponding extension domain based on the predicted values. The execution feedback module is used to determine the corresponding hierarchical anti-theft strategy based on the corresponding extended domain, so as to iteratively optimize the multi-task fusion model.
9. An electronic device, characterized in that, The electronic device includes a memory and a processor, the memory storing a computer program, and the processor executing the computer program to implement the method according to any one of claims 1 to 7.
10. A computer-readable storage medium storing a computer program, characterized in that, When the computer program is executed by a processor, it implements the method of any one of claims 1 to 7.