Multi-view independent flexible ton bag bag opening detection method and system

By employing a multi-view independent flexible ton bag opening detection method, and utilizing multi-view redundant design and lightweight decision layer fusion, the problems of low accuracy, poor stability, and insufficient real-time performance in ton bag opening detection are solved, achieving efficient and robust flexible ton bag opening detection.

CN122244847APending Publication Date: 2026-06-19GUANGZHOU HENLL ELECTRONICS EQUIP CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
GUANGZHOU HENLL ELECTRONICS EQUIP CO LTD
Filing Date
2026-03-02
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

Existing technologies for detecting the opening of ton bags suffer from low accuracy, poor stability, insufficient real-time performance, and weak adaptability. They are particularly difficult to achieve efficient detection under conditions of flexibility, non-rigidity, easy deformation, low texture, and strong interference.

Method used

A multi-view independent flexible ton bag opening detection method is adopted, which achieves accurate detection of the flexible ton bag opening through multi-view redundancy design, lightweight decision layer fusion, and AI adaptive learning mechanism. The method includes multi-view synchronous data acquisition, single-view independent preprocessing, parallel independent inference, decision layer adaptive fusion, and global state calculation. Self-attention mechanism and weighted 3D reconstruction technology are used to ensure the robustness and real-time performance of the detection.

Benefits of technology

It achieves high-precision and rapid detection of the bag opening of ton bags in complex environments, reduces system latency and hardware costs, enhances the system's adaptability and scalability, and can automatically adapt to detection tasks under different materials and lighting conditions.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122244847A_ABST
    Figure CN122244847A_ABST
Patent Text Reader

Abstract

This invention discloses a method and system for detecting the opening of a flexible ton bag based on multi-view independent operation, belonging to the field of industrial machine vision inspection. The method includes: constructing a multi-view perception array to simultaneously acquire image data of the ton bag opening area; performing independent preprocessing and ROI region cropping on each image data stream; inputting each independent image stream into a lightweight AI inference model to parallel calculate the bag opening center coordinates, contour mask, state category probability, and local confidence vector under each viewpoint; constructing a decision-layer fusion module to receive the output vectors of all single-view models, dynamically calculating the weight coefficients of each viewpoint based on current working environment parameters, historical detection accuracy, and the confidence distribution of each viewpoint output using a meta-learning mechanism; and generating the globally optimal 3D pose estimation and state determination result of the bag opening through weighted voting or attention mechanisms, and sending the result to a PLC or robotic arm controller for execution.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the fields of industrial automation and computer vision technology, specifically to a method and system for detecting the bag opening status during the automatic opening, bagging, and filling processes of flexible container bags (ton bags) in bulk packaging, logistics warehousing, and chemical production scenarios. Particularly, it relates to a method and system for detecting the bag opening of flexible ton bags based on multi-view independent perspectives. Background Technology

[0002] Ton bags, as a type of large-capacity, low-cost, and reusable flexible packaging container, are widely used in bulk material packaging and transportation in industries such as chemicals, building materials, grain, and pharmaceuticals. With the advancement of intelligent manufacturing, automatic opening, automatic bagging, and automatic filling of ton bags have become key aspects for improving production efficiency, reducing labor costs, and improving the working environment (avoiding dust hazards).

[0003] However, automating the operation of ton bags faces significant technical challenges. Unlike rigid objects (such as cardboard boxes and metal parts), ton bags are made of woven polypropylene (PP) or polyethylene (PE) fabric, exhibiting strong flexibility and non-rigidity. During lifting, moving, and storage, the bag opening is highly susceptible to irregular wrinkles, twisting, collapse, or even complete closure. Furthermore, industrial environments typically involve high concentrations of dust, varying natural or artificial light, and the specular reflection of the woven fabric surface, all of which severely interfere with the normal operation of vision sensors.

[0004] Currently, visual inspection technologies for the opening of ton bags are mainly divided into the following categories, but all of them have obvious limitations:

[0005] (1) Monocular vision scheme:

[0006] This is the most basic approach, which typically involves using a camera positioned in a fixed location to photograph the bag opening.

[0007] Limitations: Limited field of view and significant blind spots. When the bag opening is obscured by wrinkles or deflected at an excessive angle, the monocular camera cannot acquire complete information, leading to detection failure. Furthermore, monocular vision struggles to accurately acquire depth information (Z-axis coordinates), failing to meet the high-precision 3D grasping requirements of robotic arms.

[0008] (2) Traditional multi-view stereo vision solution:

[0009] Using binocular or multi-view cameras, the parallax is calculated through a stereo matching algorithm to reconstruct the 3D point cloud of the bag opening.

[0010] defect:

[0011] Dependence on Texture and Feature Points: Stereo matching algorithms rely heavily on rich textures and stable feature points in images. The surface of ton bags often has a simple texture (especially solid-color bags), and wrinkles can cause non-linear deformation of feature points, making the matching algorithm prone to failure and generating a large number of noise points or holes.

[0012] High computational complexity: High-resolution stereo matching and point cloud processing require enormous computing power, which is difficult to meet the real-time requirements of high-speed production line cycles (usually <500ms / bag).

[0013] Sensitive to the environment: Dust and reflections can disrupt the polar constraint conditions, leading to matching errors.

[0014] (3) Image stitching and fusion scheme:

[0015] Images from multiple cameras are stitched together into a panoramic image using a homography matrix, and then subjected to unified detection.

[0016] defect:

[0017] Registration difficulties: Since the ton bag is a flexible body, its shape varies greatly from different perspectives (non-rigid deformation). Traditional rigid transformations (rotation, translation) cannot accurately align images from different perspectives, and artifacts and misalignments are easily generated at the stitching points, which in turn interferes with subsequent detection algorithms.

[0018] High latency: The image stitching process increases the data processing links and reduces the system response speed.

[0019] (4) Rule-based traditional image processing schemes:

[0020] Traditional operators such as edge detection (Canny) and Hough Transform are used to extract circular or elliptical contours.

[0021] Drawbacks: Extremely poor robustness. Once the shape of the bag opening becomes irregular due to wrinkles, or the edge breaks due to uneven lighting, the traditional operator will fail to work, requiring frequent manual adjustment of the threshold parameters, and cannot adapt to changing working conditions.

[0022] In summary, existing technologies generally suffer from low accuracy, poor stability, insufficient real-time performance, and weak adaptability when dealing with the problem of detecting the opening of ton bags, which are characterized by "flexibility, non-rigidity, easy deformation, low texture, and strong interference". Summary of the Invention

[0023] The purpose of this invention is to overcome the above-mentioned defects of the prior art and provide a method and system for detecting the opening of a flexible ton bag based on multiple independent perspectives.

[0024] The core objective of this invention is:

[0025] Improve robustness: Through multi-view redundancy design, the system can still output accurate detection results when some views are blocked, reflected, or blurred.

[0026] Reduce complexity: Avoid complex image-level fusion and 3D reconstruction, and adopt lightweight decision-level fusion to significantly reduce computing power requirements and system latency.

[0027] Enhanced adaptability: Utilizing an AI adaptive learning mechanism, it automatically adapts to detection tasks under different materials, wrinkle levels, and lighting conditions.

[0028] Modularization: Supports dynamic adjustment of the number of cameras and hot switching of models, facilitating flexible deployment and maintenance of production lines.

[0029] One aspect of the present invention provides a method for detecting the opening of a flexible ton bag based on multi-view independent reasoning and adaptive fusion of decision layer, comprising the following steps:

[0030] Step S1: Multi-view synchronous data acquisition. Deployment One industrial camera ( The working area around the ton bag opening is arranged in a non-coplanar ring or multi-level array. The controller is triggered by an FPGA hardware, within a microsecond-level time window. Internal synchronization triggers the acquisition of image frames from all cameras. ,in Indicates the first A camera at any time The original image was captured, and the robot arm pose at that moment was recorded simultaneously. and environmental sensor data Each of the cameras Corresponding to a unique perspective and height ;

[0031] Step S2: Single-view independent preprocessing. For each frame image... Independent pipeline processing is performed, including: based on the camera intrinsic matrix. The process includes distortion correction, adaptive dehazing and denoising based on histogram statistics, and cropping based on prior ROI regions. Throughout this process, any pixel-level stitching, feature map fusion, or epipolar constraint matching operations across cameras are strictly prohibited to ensure physical isolation and computational parallelism of data streams from different viewpoints.

[0032] Step S3: Parallel Independent Inference. The preprocessed image set... Enter them separately A parallel AI inference instance Each instance Shared backbone network weights However, it has independent task header parameters. Each instance outputs a local observation vector. ,in:

[0033] The pixel coordinates of the bag opening center on the image plane;

[0034] The rotation angle of the bag opening's main axis relative to the horizontal line;

[0035] The multi-class probability distribution of the bag opening state;

[0036] The semantic segmentation mask for the bag opening region;

[0037] The intrinsic confidence of the model output.

[0038] Mark the inference time and resource consumption;

[0039] Step S4: Adaptive Fusion at the Decision Level. Construct a decision fusion network. The input is a set of local observation vectors. and context vectors ,in This refers to the statistical features detected within a historical sliding window. (Network) The dynamic weight vectors for each viewpoint are calculated using a multi-head self-attention mechanism. ,satisfy and The weighting calculation explicitly incorporates the geometric importance of the viewpoint, the current image quality score, and the consistency verification results with other viewpoints;

[0040] Step S5: Global state calculation, control, and anomaly removal. Based on dynamic weights. Perform weighted global optimization:

[0041] State determination: A weighted voting mechanism is used to determine the final state label. ;

[0042] This is a global state label, dimensionless. This is a category index, dimensionless; The total number of categories, dimensionless; To get the maximum value index, For single-view probability, For dynamic weights.

[0043] 3D reconstruction: converting the two-dimensional coordinates of each viewpoint Combined with camera external parameters Back projection is a three-dimensional spatial ray. Construct a weighted distance loss function. ,in Let be the Euclidean distance from the point to the line. This is solved by... Obtain the global optimal 3D coordinates ;

[0044] Anomaly suppression: If the weight of a certain viewpoint Below the threshold If its residuals exceed the 3σ distribution, it will be treated as an outlier and completely removed during the optimization process;

[0045] Step S6: Closed-loop control and execution. The calculated... Global attitude angle and It is encapsulated as a standard control message and sent to the PLC or robotic arm controller via real-time industrial Ethernet (Ether CAT / Profinet) to guide the actuator to perform adaptive bag opening, bag putting, or correction actions.

[0046] Preferably, the AI ​​inference example described in step S3 It adopts a composite architecture of "shared trunk - independent branches - knowledge distillation":

[0047] Backbone network: Adopts an improved MobileNetV3-Large or Swin-Tinyv2 architecture, removes the last fully connected layer, and outputs multi-scale feature maps. The backbone network is in all... The memory weights are shared among inference instances to reduce the GPU memory usage to 1 / N of the original.

[0048] Multi-tasking task head:

[0049] Regression Head: Contains two fully connected branches, which predict the center point offset respectively. and angle The activation function uses a combination of Sigmoid and Linear;

[0050] Classification Head: It uses global average pooling followed by a Softmax layer to output the probabilities of four states;

[0051] Segmentation Head: Employs a lightweight FPN (Feature Pyramid Network) structure, upsamples the feature map to the original image resolution, and outputs a binarized bag opening mask;

[0052] Knowledge distillation training strategy: During the offline training phase, a teacher network with extremely large parameters (such as ResNet-101+DeepLabV3+) is introduced, and KL divergence loss is applied. and feature imitation loss The dark knowledge of the teacher network is transferred to the lightweight student network, enabling the student network to maintain more than 95% accuracy while reducing the number of parameters by 90%.

[0053] The adaptive dynamic fusion of the decision layer described in step S4 specifically includes the following sub-steps:

[0054] 3.1 Feature Embedding Mapping: Mapping each local observation vector Through a learnable linear mapping layer Convert to high-dimensional latent vectors ;

[0055] 3.2 Context-aware coding: Encoding ambient light intensity Dust concentration Relative position of robotic arm Encoded as context vector and with To splice or add together;

[0056] 3.3 Consistent Attention Calculation: Calculating the Query Matrix Key matrix Sum matrix Attention score Used to measure perspective Perspective Consistency between them. If perspective The difference from most other perspectives is significant, which will reduce its average attention score;

[0057] 3.4 Dynamic Weight Generator: Inputs the attention-aggregated vector into a small MLP network and outputs unnormalized logits. The final weights are calculated using Temperature-Scaled Softmax: ,in A learnable temperature parameter used to control the smoothness of the weight distribution;

[0058] 3.5 Confidence Gating Mechanism: Introduce a hard gating strategy, if... or image entropy (Indicating the image is too dark or overexposed), then force reset. The remaining weights are then redistributed before normalization.

[0059] Preferably, the global three-dimensional pose calculation in step S5 adopts the "weighted ray intersection optimization algorithm", the specific mathematical derivation of which is as follows:

[0060] Let the first The optical center of each camera is located in the world coordinate system as follows: Its optical axis direction vector is (Obtained by backprojection from the center pixel of the image). Then the... A ray can be represented as ;

[0061] The goal is to find a spatial point. This minimizes the sum of the squared weighted distances from the point to all rays. to X-ray The distance formula is: ;

[0062] Construct the objective function: ;

[0063] Regarding this function Taking the derivative and setting it to zero, we obtain the system of linear equations:

[0064] ;

[0065] remember , The optimal solution is .because Possible singularities (e.g., all cameras are collinear); in practice, SVD decomposition or the addition of regularization terms should be used to solve the problem. Perform a robust solution;

[0066] This method avoids the reliance on precise feature point matching in traditional triangulation methods. Even if there are large angular errors in some viewpoints, as long as the weight allocation is reasonable, it can still converge to the global optimal solution.

[0067] It also includes online adaptive and hot-swap mechanisms based on meta-learning:

[0068] Operating condition fingerprint recognition: The system extracts the "operating condition fingerprint" vector of the current environment in real time. It includes illumination histogram features, texture spectrum features, and historical detection residual distribution;

[0069] Model library management: Pre-train and store multiple dedicated model packages in the cloud. Optimized for different materials and extreme environments;

[0070] Rapid adaptation: When a significant shift in the working condition fingerprint is detected (such as switching from daytime to nighttime, or changing the material of the ton bag), a gradient-based meta-learning algorithm (such as MAML) is used to fine-tune the decision layer weights or specific Batch Norm layer parameters at the edge using only a small number of recently collected samples (Support Set, about 10-20 frames) to achieve second-level adaptation.

[0071] Dynamic topology reconfiguration: Supports dynamically adding or removing camera nodes during operation. When a new camera... Upon connection, the system automatically loads the default initialization parameters, and then... In this detection, the correlation between its output and other cameras is analyzed by unsupervised clustering, and its initial weight in the decision layer is automatically learned without stopping the machine for recalibration or training.

[0072] In one aspect, the present invention provides a flexible ton bag opening detection system based on multi-view independent reasoning and adaptive fusion of decision layer, comprising:

[0073] The sensing layer subsystem consists of 6-8 global shutter industrial cameras, a synchronous trigger controller, a strobe light source, and a dust cover. The camera mounting bracket has a six-degree-of-freedom adjustment mechanism, supporting rapid calibration.

[0074] Edge computing subsystem: Adopting a heterogeneous computing architecture, it includes a CPU (responsible for logic control and communication), a GPU / NPU (responsible for parallel inference), and an FPGA (responsible for image preprocessing acceleration). Internally, it runs containerized microservices, including image acquisition services, inference engine services, decision fusion services, and data caching services.

[0075] Decision fusion engine: Built-in adaptive weight allocation algorithm library, 3D reconstruction solver and anomaly diagnosis module;

[0076] Human-computer interaction and operation and maintenance terminal: Provides a visual interface to display confidence heatmaps from various perspectives, 3D reconstruction trajectories, weight distribution pie charts, and system health status in real time. Supports remote model deployment, parameter configuration, and log export;

[0077] Industrial communication gateway: integrates Profinet, EtherCAT, Modbus TCP and OPCUA protocol stacks to achieve bidirectional real-time communication with the host MES system and the underlying PLC.

[0078] Preferably, the perception layer subsystem has an "active visual feedback" function:

[0079] If the decision-making level finds that the confidence level of all perspectives is low (such as being in a critical state), it can actively send instructions to adjust the brightness, exposure time or trigger angle of the strobe light source (if equipped with a motorized gimbal) to attempt active resampling in order to obtain higher quality images, rather than directly reporting an error.

[0080] Preferably, the system further includes a "data closed-loop iteration unit":

[0081] Hard Examples Discovery: Automatically filter samples with confidence scores below the threshold, excessively long detection times, or those that have been manually corrected, and mark them as "Hard Examples".

[0082] Automatic annotation assistance: Utilizes multi-view geometric constraints to generate pseudo-labels to assist manual annotation quickly;

[0083] Cloud-based training cluster: Regularly uploads difficult examples to the cloud, triggering incremental training tasks and generating new version models;

[0084] Gray-scale release: The new model is first tested in a single production line or simulation environment to verify the performance improvement before being fully pushed to all production line edge nodes.

[0085] In one aspect, the present invention provides a computer-readable storage medium having a computer program stored thereon, which, when executed by a processor, implements the method steps described in any of the above schemes, including but not limited to image preprocessing, parallel inference, attention weight calculation, weighted 3D reconstruction, and control instruction generation.

[0086] In one aspect, the present invention provides an electronic device, including a memory, a processor, and a computer program stored in the memory and executable on the processor. When the processor executes the program, it can schedule data from multiple cameras, run multiple neural network instances in parallel, and calculate the three-dimensional posture of the opening of a flexible ton bag in real time, thereby controlling a robotic arm to complete automated operations.

[0087] Compared with the prior art, the present invention has the following significant advantages:

[0088] 1. Extremely high anti-interference capability: Due to the adoption of decision-level fusion, the failure of a single viewpoint (such as a completely white image due to reflection, lens blur due to dust, or feature loss due to wrinkles) will not cause the entire system to crash. The system can automatically "ignore" the broken viewpoint and make correct judgments based on other normal viewpoints, realizing a redundant fault-tolerant mechanism of "if one viewpoint is not working, another will."

[0089] 2. Superior real-time performance: It avoids time-consuming image stitching and dense point cloud reconstruction. Single-view inference can utilize extremely lightweight models (such as INT8-quantized Mobile Nets), and after parallel processing, only simple vector operations are performed at the decision layer. Overall latency can be controlled within 50ms, meeting the requirements of high-speed production lines.

[0090] 3. Strong adaptability to flexible deformation: It does not rely on rigid geometric constraints (such as epipolar constraints), but rather on data-driven probabilistic fusion. Even if the bag opening is severely twisted, as long as there is a viewpoint that can capture effective features, the decision-making layer can comprehensively infer the true state, which is particularly suitable for non-rigid objects such as ton bags.

[0091] 4. System Scalability and Maintainability: The modular design allows adding or removing cameras without redeveloping the software; simply register the new node in the configuration file, and the decision layer will automatically learn the weight of the new node. It supports one-click switching of dedicated model packages for different types of ton bags (e.g., 1-ton, 2-ton, conductive bags).

[0092] 5. Reduced hardware costs: It does not require expensive high-precision LiDAR or structured light cameras. It only requires ordinary industrial area array cameras and edge computing boxes to achieve high-precision 3D positioning, which greatly reduces system integration costs.

[0093] To better understand and implement this invention, the following detailed description is provided in conjunction with the accompanying drawings. Attached Figure Description

[0094] Figure 1 This is a flowchart illustrating an exemplary method for detecting the opening of a flexible ton bag based on multiple independent viewpoints according to the present invention.

[0095] Figure 2 This is a three-dimensional structural diagram illustrating the distribution of industrial cameras in an exemplary multi-view independent flexible ton bag opening detection system of the present invention.

[0096] Figure 3 This is a side view of the industrial camera distribution state of an exemplary flexible ton bag opening detection system based on multiple independent viewing angles according to the present invention. Detailed Implementation

[0097] In the description of this invention, it should be understood that the terms "center," "longitudinal," "lateral," "upper," "lower," "front," "rear," "left," "right," "vertical," "horizontal," "top," "bottom," "inner," and "outer," etc., indicating orientations or positional relationships, are based on the orientations or positional relationships shown in the accompanying drawings and are used only for the convenience of describing the invention and simplifying the description, and do not indicate or imply that the device or element referred to must have a specific orientation, or be constructed and operated in a specific orientation, and therefore should not be construed as a limitation of the invention. In the description of this invention, unless otherwise stated, "a plurality of" means two or more.

[0098] like Figures 1-3 The specific embodiments of the present invention are as follows:

[0099] Step 1: Multi-view synchronous data acquisition architecture

[0100] The system deploys N (preferably 6-8) industrial cameras above and around the ton bag handling station. These cameras are not randomly placed, but rather feature an optimized spatial layout:

[0101] The main camera in the center looks vertically downwards and is mainly responsible for capturing the overall outline of the bag opening and the approximate position of the center.

[0102] Peripheral auxiliary cameras: Distributed at different heights and angles (such as a 45-degree tilt), they are mainly responsible for capturing the side shape of the bag opening, fold details, and edge depth information.

[0103] All cameras achieve microsecond-level synchronized exposure through hardware triggers, ensuring that the bag opening status is captured at the same moment and avoiding timing errors caused by robotic arm movement or bag shaking.

[0104] Step 2: Single-view independent preprocessing and inference (decoupled design)

[0105] This is the biggest difference between this invention and traditional solutions. The system does not perform any cross-camera image stitching, feature map fusion, or point cloud registration.

[0106] Independent preprocessing: The image stream from each camera enters an independent preprocessing thread, which only performs noise reduction, contrast enhancement, distortion correction, and ROI (region of interest) cropping based on prior knowledge.

[0107] Independent Inference: Each preprocessed image is fed into an independent AI inference instance. Although these instances share the weights of the same backbone to save GPU memory, they have their own independent task heads.

[0108] Output definition: Each inference instance outputs a structured vector of local results. ,Include:

[0109] 2D coordinates: The center of the bag opening lies on the current image plane. coordinate.

[0110] Geometric parameters: major axis, minor axis, and rotation angle of the bag opening ellipse.

[0111] State probability: The probability distribution of the bag opening being in a state such as "flat", "slightly wrinkled", "severely wrinkled", "closed", or "offset".

[0112] Confidence score: The confidence score of the model for this detection result (0.0-1.0).

[0113] Semantic mask: (Optional) Pixel-level segmentation mask for the pocket opening area.

[0114] Step 3: Adaptive Integration at the Decision-Making Level (Core Innovation)

[0115] All local result vectors These are then aggregated at the decision-making level. The decision-making level no longer processes raw pixels, but rather high-level semantic information.

[0116] Dynamic weight calculation: The decision layer incorporates a lightweight fusion network (such as an MLP or Transformer Encoder). The network inputs include:

[0117] Local result vectors from each viewpoint .

[0118] Environmental context vector (data from light sensor and dust sensor).

[0119] Historical statistics (average accuracy of each camera over a past period).

[0120] Attention Mechanism: The fusion network uses a self-attention mechanism to analyze the consistency between different viewpoints. If the output of a certain viewpoint seriously conflicts with the output of most other viewpoints (for example, the other 5 cameras all detect that the bag opening is on the left, but only camera A detects that it is on the right, and camera A has a low confidence level), the network will automatically reduce the weight of camera A, or even treat it as noise and remove it.

[0121] Weighted fusion: based on the calculated dynamic weights The coordinates and angles of each viewpoint are weighted and averaged to obtain the globally optimal estimate. For state classification, a weighted voting mechanism is used to determine the final state.

[0122] Step 4: Virtual 3D Reconstruction and Control Output

[0123] Based on the fused 2D information and camera calibration parameters, the system uses the "ray intersection optimization method" instead of the traditional triangulation method to solve the 3D coordinates.

[0124] The 2D detection points of each viewpoint are back-projected into a ray in three-dimensional space.

[0125] Construct an objective function to find a point in space that minimizes the sum of the squared distances from that point to all weighted rays. Rays with higher weights exert a stronger constraint on the final point.

[0126] The solution The coordinates and attitude angles are directly converted into motion commands for the robotic arm, guiding the gripper or suction cup to perform the bag-opening action.

[0127] Example 1: System Hardware Setup

[0128] This embodiment is applied to the automatic ton bag filling production line of a chemical company.

[0129] Camera selection and layout:

[0130] Six industrial cameras were selected, with a resolution of 1280×1024, a frame rate of 60fps, and a global shutter.

[0131] Layout scheme: One camera is installed vertically 2 meters above the filling port (main camera); five cameras are evenly distributed on a circle with a diameter of 1.5 meters, installed at a height of 1.2 meters, with the lenses tilted downwards at 45 degrees (auxiliary cameras).

[0132] All cameras are connected to the PLC via an IO controller to achieve hard-triggered synchronization, with the exposure time set to 200μs to freeze motion blur.

[0133] A high-brightness stroboscopic light source is installed to synchronize with the camera's trigger signal, thereby eliminating the influence of ambient light fluctuations.

[0134] Computing platform:

[0135] It adopts an edge computing module and has 100 TOPS of AI computing power.

[0136] Deploy Docker containers to manage all inference services.

[0137] Communication interface:

[0138] The camera is connected to the computing platform via a PoE switch.

[0139] The computing platform communicates with the PLC via the Profinet protocol and with the FANUC robotic arm controller via TCP / IP.

[0140] Example 2: Detailed Explanation of Software Algorithm Flow

[0141] Step 1: Data Preprocessing

[0142] When the PLC sends a "position" signal, it triggers the camera to take a picture.

[0143] The image is fed into the preprocessing queue.

[0144] Gaussian filtering is used to remove high-frequency noise.

[0145] Use CLAHE (Limited Contrast Adaptive Histogram Equalization) to enhance the contrast at the edge of the bag opening, especially in backlit or shadowed areas.

[0146] Based on the preset robotic arm working area, the ROI area (approximately 640×640 pixels) is cropped to reduce unnecessary calculations.

[0147] Step 2: Single-view model inference

[0148] The pre-trained improved YOLOv8-Seg model is loaded as the backbone. This model has been pruned and INT8 quantized, and the single-frame inference time is <5ms.

[0149] Six parallel image inputs and six inference threads.

[0150] Output example:

[0151] Camera 1 (Main Camera): Center (320, 310), Angle 5°, Status "Flat", Confidence 0.95.

[0152] Camera 2 (side view): Center (400, 200), angle 12°, status "wrinkled", confidence level 0.88.

[0153] Camera 3 (side view): Center (390,210), angle 10°, status "wrinkled", confidence level 0.15 (the image is blurry due to a small amount of dust on the lens, and the model automatically gives a low confidence level).

[0154] ...Other camera outputs omitted.

[0155] Step 3: Integration of Decision-Making Levels

[0156] The decision network receives 6 sets of data.

[0157] Consistency check: The results of camera 3 were found to deviate significantly from those of other cameras, and its confidence level was only 0.15.

[0158] Weight calculation: The fusion network is based on the formula Calculate the weights. Where... This indicates the spatial similarity between the result and other results.

[0159] Calculation results: Weights of camera 3 As the value approaches 0, cameras 1, 2, 4, 5, and 6 receive higher weights.

[0160] Global solution:

[0161] Coordinate fusion: (After coordinate transformation).

[0162] Status determination: The weighted voting results show that "wrinkled" has the highest probability, so it is determined to be in a wrinkled state.

[0163] 3D reconstruction: Using the weighted ray intersection, the coordinates of the bag opening center in the robot arm base coordinate system are calculated as (X=1200mm, Y=500mm, Z=850mm).

[0164] Step 4: Execute Control

[0165] The system sends the coordinates and status to the robotic arm.

[0166] The robotic arm automatically switches to the "rubbing to open" mode (first clamping the two sides of the bag opening and rubbing back and forth to open the bag opening) based on the "wrinkled" state, instead of the direct "insertion" mode.

[0167] If the confidence level of all cameras is below the threshold (e.g., <0.3), the system determines that it is "unrecognizable", triggers an audible and visual alarm and requests manual intervention to prevent accidental operation.

[0168] Example 3: Model Training and Optimization Strategies

[0169] Data collection and labeling

[0170] A total of 50,000 frames of image data were collected, covering different materials (PP weaving, PE film), different colors, different lighting (daytime, nighttime, strong light, weak light), and different bag opening states (fully open, half open, closed, various pleat shapes).

[0171] Labeling strategy:

[0172] Single-frame annotation: Annotate each image with 2D key points (center of the bag opening, four quadrant points), polygonal outline, and state category.

[0173] Global annotation: For a set of 6 synchronized images, the true 3D coordinates are annotated (obtained through laser tracker-assisted calibration) as a supervision signal for the decision-making layer (Ground Truth).

[0174] Two-stage training method

[0175] Phase 1: Single-view pre-training. Using single-frame labeled data, a perceptual model is trained independently for each viewpoint. Data augmentation (random rotation, brightness variation, simulated dust occlusion, simulated reflection) is introduced to improve the robustness of the single model. Knowledge distillation is employed, using ResNet-101 as the teacher network to guide the MobileNetV3 student network.

[0176] Phase Two: Fine-tuning the Decision Layer. Freeze the single-view model parameters and construct a decision fusion network. Input simulated multi-view output vectors (artificial noise can be added to simulate abnormal views), using global 3D annotations as ground truth, and train the weight allocation strategy of the decision network end-to-end. Introduce meta-learning concepts to teach the model "which camera to trust under what circumstances."

[0177] Online iteration

[0178] After system deployment, enable "shadow mode". This means that during actual operation, all low-confidence samples and samples with manual intervention are recorded.

[0179] These challenging examples are uploaded to the cloud server each week for re-annotation and incremental model training.

[0180] The newly generated model version is deployed to the production line via OTA, enabling the system performance to evolve on its own.

[0181] Typical case analysis.

[0182] Case 1: Single-view failure caused by strong reflection

[0183] Scene: At 3 PM, the sunlight was shining directly into the workshop, causing a large area of ​​the footage taken by the three cameras on the east side to be overexposed, and the bag opening features were completely lost.

[0184] System response:

[0185] Confidence head output of the three cameras on the east side .

[0186] The decision-making attention mechanism detected that the outputs of these three perspectives were significantly inconsistent with the other five perspectives (west side, top view).

[0187] Dynamic weight calculation results: Weights of the three cameras on the east side The weights of the west-side and top-view cameras are automatically increased to $0.2-0.3.

[0188] Result: Relying solely on the remaining five cameras, the system still output high-precision 3D coordinates, and the robotic arm successfully executed the bag-opening action. Log record: "Warning: Camera1,2,3 low confidence. Weight re-assigned."

[0189] Case 2: Misjudgment of shape due to severe wrinkles

[0190] Scenario: The opening of the ton bag is twisted in a spiral shape, which looks like a closed ring when viewed from above, and can easily be misjudged as "closed".

[0191] System response:

[0192] Top-view camera (Camera0) output status: Closed (Prob=0.85).

[0193] The side-view camera (Camera2,4,6) shows the gap at the bag opening from the side, outputting the status as Wrinkle / Open (Prob=0.90).

[0194] The decision-makers, considering all perspectives, found a conflict between the "closed" conclusion of the top view and the "open" evidence of the side view. Because the side-view camera has a higher geometric prior weight in judging "depth" and "gap," the decision-makers favored the side-view result.

[0195] Result: The bag was ultimately identified as "Wrinkle," and the robotic arm initiated the "rubbing" program, successfully opening the bag. Using only a top-view camera would have resulted in misjudgment and attempts to insert directly, potentially damaging the bag.

[0196] Case 3: Lens dust pollution

[0197] Scenario: After working continuously for 2 hours, the lens of camera No. 3 became covered with a thin layer of dust, resulting in decreased image contrast.

[0198] System response:

[0199] The image entropy calculation module detected that the image entropy value of Camera3 was lower than the threshold.

[0200] Hard gating mechanism directly Set to 0.

[0201] The system triggered a maintenance alert: "Camera3lensdirty, please clean".

[0202] Result: While the system issued an alarm, it continued to operate normally using the other 7 cameras, without causing a production line shutdown.

[0203] It should be further noted that in the above scheme, the installation heights of the multiple industrial cameras are different, and their optical axes are distributed in a divergent pattern. This results in varying overlap and non-overlapping areas in the coverage of the ton bag opening region from different perspectives. The multiple industrial cameras are distributed in a "surrounding divergent pattern," specifically, as shown below... Figure 2 and Figure 3As shown, although the optical axes of the multiple industrial cameras are divergent, they are all directed towards the location where the ton bags are placed; in other words, the optical axes of the multiple industrial cameras are all directed towards the area where the ton bags are placed.

[0204] The “surrounding and diverging” distribution of multiple industrial cameras in this invention has the following effects.

[0205] 1) Eliminate “coplanar blind spots” and “dead zones”.

[0206] The differences in altitude and angle create complementary fields of view.

[0207] Looking down from a high position, you can see inside the bag opening through the side folds;

[0208] Looking up from a low position can avoid dust accumulation or strong light reflection at the top;

[0209] Lateral angles can capture edge depth information that cannot be seen in frontal views.

[0210] Conclusion: Omnidirectional coverage without blind spots was achieved, significantly reducing the systemic missed detection rate caused by single-morphological deformation.

[0211] 2) Enhance the "depth resolution" of 3D reconstruction.

[0212] The inconsistency in height and the divergence in angle naturally create the conditions for large baseline observation.

[0213] Cameras at different heights create a greater parallax angle at the same point, which significantly improves the sensitivity in the depth direction when calculating 3D coordinates.

[0214] By combining the "virtual projection-optimized fitting" algorithm, this large baseline data can more accurately fit the real spatial surface of the bag opening. Even on a flexible surface without texture, the precise three-dimensional shape can be deduced from the dramatic perspective changes of the outline.

[0215] 3) Improve the reliability of "abnormal perspective removal".

[0216] Due to the significant difference in perspective, the manifestation of interference varies drastically from different viewpoints.

[0217] For example, strong light from above may cause overexposure in a high-angle camera, but it will not affect the image at all in a low-angle, upward-looking camera; dust on the side may obscure a level-on lens, but it will still be clear in a downward-looking lens.

[0218] In "adaptive fusion at the decision level", the system can use this difference as a confidence criterion: if the geometric relationship derived from the data of a certain perspective is seriously inconsistent with that of other perspectives at different heights / angles (and does not conform to the perspective rules of that perspective), the system can more confidently identify it as an "outlier" and remove its weight, rather than blindly averaging.

[0219] The embodiments described above are merely examples of several implementations of the present invention, and while the descriptions are relatively specific and detailed, they should not be construed as limiting the scope of the patent application. It should be noted that those skilled in the art can make various modifications and improvements without departing from the concept of the present invention, and these modifications and improvements all fall within the scope of protection of the present invention.

Claims

1. A flexible ton bag bag mouth detection method based on multi-view independent reasoning and decision layer adaptive fusion, characterized in that, Includes the following steps: Step S1, multi-view spatio-temporal synchronous acquisition; deployment The plurality of industrial cameras are distributed in a non-coplanar annular array around the ton bag bag opening operation area; wherein the installation heights of the plurality of industrial cameras are different from each other, and the optical axis orientation angles are distributed in a divergent manner, so that the coverage ranges of the views on the ton bag bag opening area have different overlapping and non-overlapping areas. Step S2, single view independent preprocessing and enhancement; for each frame image Independent execution of pipeline processing, including: distortion correction based on camera intrinsic matrix Adaptive de-fogging and de-noising based on histogram statistics, and clipping based on prior ROI region; in this process, any cross-camera pixel-level splicing, feature map fusion or epipolar constraint matching operation is strictly prohibited, ensuring the physical isolation and computing parallelism of each view data stream; Step S3, parallel lightweight independent inference; the preprocessed image set is respectively input to parallel AI inference instances ; each instance shares the backbone network weight and has independent task head parameters ; each instance outputs a local observation vector ; Step S4, decision layer self-adaptive dynamic fusion; construct a decision fusion network , the input is a set of local observation vectors and context vectors , wherein is the detection statistical feature in the historical sliding window; the network calculates the dynamic weight vector of each view through the multi-head self-attention mechanism , which satisfies and ; Step S5: Global 3D pose calculation and anomaly removal; based on dynamic weights Perform weighted global optimization: State determination, 3D reconstruction, anomaly suppression; Step S6, Closed-loop control and execution; the calculated... Global attitude angle and It is encapsulated as a standard control message and sent to the PLC or robotic arm controller via real-time industrial Ethernet to guide the actuator to perform adaptive bag opening, bag putting, or correction actions.

2. The method for detecting the opening of a flexible ton bag according to claim 1, characterized in that, The AI ​​inference example described in step S3 It adopts a composite architecture of "shared trunk - independent branches - knowledge distillation": Backbone network: Adopts an improved MobileNetV3-Large or Swin-Tinyv2 architecture, removes the last fully connected layer, and outputs multi-scale feature maps. The backbone network is in all The memory weights are shared among inference instances to reduce the GPU memory usage to 1 / N of the original. Multi-task task head: Regression head: Contains two fully connected branches, which predict the center point offset respectively. and angle The activation function uses a combination of Sigmoid and Linear; Classification head: It uses global average pooling followed by a Softmax layer to output the probabilities of 4 states; Segmentation Head: Employs a lightweight FPN structure, upsamples the feature map to the original image resolution, and outputs a binarized bag opening mask; Knowledge distillation training strategy: During the offline training phase, a teacher network with extremely large parameters is introduced, and KL divergence loss is used. and feature imitation loss The implicit knowledge of the teacher network is transferred to the lightweight student network, enabling the student network to maintain more than 95% accuracy while reducing the number of parameters by 90%.

3. The method for detecting the opening of a flexible ton bag according to claim 2, characterized in that, The adaptive dynamic fusion of the decision layer described in step S4 specifically includes the following sub-steps: S3.1 Feature Embedding Mapping: This maps each local observation vector... Through a learnable linear mapping layer Convert to high-dimensional latent vectors ; S3.2 Context-aware coding: Ambient light intensity Dust concentration Relative position of robotic arm Encoded as context vector and with To splice or add together; S3.3 Consistent Attention Calculation: Calculate the query matrix Key matrix Sum matrix ; Attention Score Used to measure perspective Perspective Consistency between them; If perspective The difference from most other perspectives is significant, which will reduce its average attention score; S3.4 Dynamic Weight Generator: Inputs the attention-aggregated vector into a small MLP network and outputs unnormalized logits. The final weights are calculated using Temperature-Scaled Softmax. ,in A learnable temperature parameter used to control the smoothness of the weight distribution; S3.5 Confidence Gating Mechanism: Introducing a hard gating strategy, if or image entropy Then force set The remaining weights are then redistributed before normalization.

4. The method for detecting the opening of a flexible ton bag according to claim 3, characterized in that, The global 3D pose calculation described in step S5 uses a "weighted ray intersection optimization algorithm", the specific mathematical derivation of which is as follows: Let the first The optical center of each camera is located in the world coordinate system as follows: Its optical axis direction vector is Then the first A ray can be represented as ; The goal is to find a spatial point. This minimizes the sum of the squared weighted distances from the point to all rays; to X-ray The distance formula is: ; Construct the objective function: ; Regarding this function Taking the derivative and setting it to zero, we obtain the system of linear equations: ; remember , The optimal solution is ;because It may be singular; in actual solutions, SVD decomposition or the addition of regularization terms should be used. Perform a robust solution; This method avoids the reliance on precise feature point matching in traditional triangulation methods. Even if there are large angular errors in some viewpoints, as long as the weight allocation is reasonable, it can still converge to the global optimal solution.

5. The method for detecting the opening of a flexible ton bag according to any one of claims 1-4, characterized in that, It also includes online adaptive and model hot-switching mechanisms based on meta-learning: Operating condition fingerprint recognition: The system extracts the "operating condition fingerprint" vector of the current environment in real time. It includes illumination histogram features, texture spectrum features, and historical detection residual distribution; Model library management: Pre-train and store multiple dedicated model packages in the cloud. Optimized for different materials and extreme environments; Rapid adaptation: When a significant drift in the working condition fingerprint is detected, a gradient-based meta-learning algorithm is used to fine-tune the decision layer weights or specific Batch Norm layer parameters at the edge using only a small number of recently collected samples, achieving second-level adaptation. Dynamic topology reconfiguration: Supports dynamic addition and removal of camera nodes during operation; when a new camera... Upon connection, the system automatically loads the default initialization parameters, and in subsequent... In this detection, the correlation between its output and other cameras is analyzed by unsupervised clustering, and its initial weight in the decision layer is automatically learned without stopping the machine for recalibration or training.

6. A flexible ton bag opening detection system based on multi-view independent reasoning and adaptive fusion of decision layer, characterized in that, include: The sensing layer subsystem consists of an industrial camera with 6-8 global shutters, a synchronous trigger controller, a strobe light source, and a dust cover; the camera mounting bracket has a six-degree-of-freedom adjustment mechanism to support rapid calibration. Among them, the installation heights of multiple industrial cameras are different, and the optical axis orientation angles are distributed in a divergent manner, resulting in different overlapping and non-overlapping areas in the coverage of the bag opening area of ​​each viewpoint. Edge computing subsystem: adopts a heterogeneous computing architecture, including CPU, GPU / NPU and FPGA; internally runs containerized microservices, including image acquisition service, inference engine service, decision fusion service and data caching service; Decision fusion engine: Built-in adaptive weight allocation algorithm library, 3D reconstruction solver and anomaly diagnosis module; Human-computer interaction and operation and maintenance terminal: Provides a visual interface to display confidence heatmaps, 3D reconstruction trajectories, weight distribution pie charts and system health status from various perspectives in real time; supports remote model distribution, parameter configuration and log export. Industrial communication gateway: integrates Profinet, EtherCAT, Modbus TCP and OPCUA protocol stacks to achieve bidirectional real-time communication with the host MES system and the underlying PLC.

7. The system according to claim 6, characterized in that, The perception layer subsystem has an "active visual feedback" function: If the decision-makers find that the confidence level of all perspectives is low, they can proactively send instructions to adjust the brightness, exposure time, or trigger angle of the strobe light source to attempt proactive resampling in order to obtain higher quality images, rather than directly reporting an error.

8. The system according to claim 6, characterized in that, The system also includes a "data closed-loop iteration unit": Difficult case discovery: Samples with confidence scores below the threshold, excessively long detection times, or those that have been manually corrected are automatically selected and marked as "difficult cases"; Automatic annotation assistance: Utilizes multi-view geometric constraints to generate pseudo-labels, assisting manual annotation in rapid processing; Cloud-based training cluster: Regularly uploads difficult examples to the cloud, triggering incremental training tasks and generating new version models; Gray-scale release: The new model is first tested in a single production line or simulation environment to verify the performance improvement before being fully pushed to all production line edge nodes.

9. A computer-readable storage medium having a computer program stored thereon, characterized in that, When the program is executed by the processor, it implements the method steps of any one of claims 1 to 5, including image preprocessing, parallel inference, attention weight calculation, weighted 3D reconstruction, and control instruction generation.

10. An electronic device comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, characterized in that, When the processor executes the program, it can schedule data from multiple cameras, run multiple neural network instances in parallel, and calculate the three-dimensional posture of the flexible ton bag opening in real time, thereby controlling the robotic arm to complete automated operations.