Mine safety monitoring method based on AI recognition monitoring system
By using AI-based image acquisition and deep reinforcement learning decision-making models in the monitoring system, the problems of positioning accuracy and attention perception in mine safety monitoring systems under low light and high dust conditions have been solved. This enables proactive risk avoidance of collisions between personnel and vehicles underground, improving the efficiency and safety of mine safety production.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- HENAN LONGDEYUN TECHNOLOGY CO LTD
- Filing Date
- 2026-03-13
- Publication Date
- 2026-06-12
AI Technical Summary
Existing mine safety monitoring systems suffer from image quality degradation under low light and high dust conditions, are unable to perceive personnel attention status in real time, and lack the ability to predict future movement trends. This results in inefficient passive alarm modes and an inability to proactively prevent collisions between people and vehicles.
An AI-powered monitoring system is employed to acquire video streams through image acquisition devices, perform image recognition and time-series analysis using edge computing devices, predict motion trajectories using deep reinforcement learning decision models, and send personalized warning commands through smart wearable devices and vehicle-to-everything (V2X) networks to achieve proactive risk avoidance.
It enables proactive decision-making regarding the risk of collisions between personnel and vehicles underground, improves the safety level of collaborative operations between personnel and vehicles underground, avoids accidents through real-time prediction and personalized early warning, and enhances the effectiveness of the safety monitoring system.
Smart Images

Figure CN122201027A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the fields of image recognition and artificial intelligence technology, and specifically to a mine safety monitoring method based on an AI-based recognition and monitoring system. Background Technology
[0002] Mine safety is a matter of life and property safety. With the acceleration of the intelligent construction of mines, the number of underground scenes with mixed traffic of people and vehicles is increasing. The risk of collision between mobile devices and workers has become a core challenge in mine safety monitoring. Building an intelligent safety monitoring system that can perceive the status of people and vehicles in real time, accurately predict movement trends, and actively avoid collision risks has become an urgent need in the field of mine safety production.
[0003] Existing mine safety monitoring systems typically employ a combination of ultra-wideband (UWB) positioning and video surveillance. UWB positioning technology uses underground base stations to receive signals from tags carried by personnel or equipment, achieving sub-meter-level location awareness. The video surveillance system uses explosion-proof cameras deployed in key areas of the tunnels to capture real-time underground footage. Personnel at the surface monitoring center manually observe any anomalies in the footage or trigger intrusion alarms via pre-set electronic fences. When personnel are detected entering a dangerous area or equipment is speeding, the system triggers an audible and visual alarm to alert on-site personnel to safety.
[0004] However, the aforementioned existing technologies have the following drawbacks: video surveillance image quality deteriorates significantly under low light and high dust conditions, and the reliance on manual observation is inefficient and prone to missed alarms. Furthermore, alarms based on electronic fences only focus on spatial location and cannot perceive the attention state of personnel, such as being distracted by looking at a mobile phone or having their gaze deviate from the direction of oncoming vehicles. This means that even if personnel are in a safe area, they may still be unaware of the danger and a collision may occur. Existing systems adopt a passive alarm mode, that is, they issue alarms when or after a risk occurs, lacking the ability to predict the future movement trends of personnel and equipment, and are unable to take proactive intervention measures to avoid accidents before a collision occurs. The warning instructions are fixed and do not take into account the current attention state of personnel. Distracted personnel may miss the opportunity to avoid danger because they do not receive an effective warning, which greatly reduces the effectiveness of the warning. Summary of the Invention
[0005] To address the aforementioned technical problems, this invention provides a mine safety monitoring method based on an AI-based identification and monitoring system.
[0006] The technical solution adopted in this invention is as follows:
[0007] A mine safety monitoring method based on an AI-based recognition and monitoring system, wherein the AI-based recognition and monitoring system includes image acquisition equipment, edge computing equipment, and smart wearable devices deployed in the mine, and the method includes the following steps:
[0008] S1. The image acquisition device acquires a video stream of the tracked object in real time; the edge computing device performs image recognition processing on the video stream to obtain the location information of each tracked object and the attention state information of the personnel in the tracked object, wherein the tracked object includes moving targets and personnel;
[0009] S2. The edge computing device performs temporal image analysis on the video stream and extracts the motion feature sequence of each tracked object; the motion feature sequence and the attention state information are input into a pre-trained spatiotemporal trajectory prediction model to obtain the predicted motion trajectory of each tracked object in a future preset time period;
[0010] S3. The edge computing device constructs a deep reinforcement learning decision model, which includes a state space, an action space, and a reward function. The state space is composed of the predicted motion trajectory, the attention state information, and the topological constraints of the underground roadway identified from the video stream. The action space consists of deceleration commands, steering commands, and stopping commands for moving targets, as well as audio-visual warning commands and vibration warning commands for personnel. The reward function includes a collision penalty term, a safe distance maintenance reward term, and a traffic efficiency penalty term.
[0011] S4. The edge computing device inputs the state space of all tracked objects at the current moment into the deep reinforcement learning decision model. The deep reinforcement learning decision model calculates the Q value of each action according to the preset policy network and selects the action combination that maximizes the Q value as the intervention decision at the current moment.
[0012] S5. Based on the intervention decision, send corresponding control commands to the moving target through the vehicle network, and at the same time send corresponding warning commands to the personnel through the smart wearable device to perform active risk avoidance operations.
[0013] In S1, the edge computing device performs image recognition processing on the video stream to obtain the location information of each tracked object, including: acquiring real-time video streams through an explosion-proof binocular camera in the image acquisition device, and simultaneously receiving ranging signals sent by UWB tags carried by the tracked objects through an ultra-wideband UWB base station; employing a tightly coupled fusion algorithm, injecting the ranging signals as constraints into the graph optimization process of visual simultaneous localization and mapping (SLAM), and using video frame feature point matching results to assist in identifying and eliminating UWB multipath errors, outputting the centimeter-level three-dimensional coordinates and heading angle of the tracked object as the location information.
[0014] In S1, the edge computing device performs image recognition processing on the video stream to obtain attention state information, including: acquiring eye images of personnel through a high frame rate eye-tracking camera in the image acquisition device; performing eye key point detection and head posture estimation on the eye images to extract the gaze direction vector; transforming the gaze direction vector from the camera coordinate system to the three-dimensional space coordinate system of the tunnel, and calculating the gaze point coordinates of the personnel in the tunnel space by combining the three-dimensional coordinates in the location information; and generating the personnel's attention state information based on the positional relationship between the gaze point coordinates and the preset hazard source area, wherein the attention state information includes a focused state and a distracted state.
[0015] The spatiotemporal trajectory prediction model in S2 is a hybrid model combining the Long Short-Term Memory Network (LSTM) and the Transformer. The hybrid model takes the motion feature sequence and the attention state information as input, extracts temporal dependency features through LSTM, captures long-distance dependencies through Transformer, and outputs the predicted motion trajectory of each tracked object in the next 3 to 5 seconds.
[0016] The deep reinforcement learning decision model built by the edge computing device in S3 adopts a hierarchical reinforcement learning architecture, including an upper-layer path planner and a lower-layer action executor. The upper-layer path planner uses a roadway topology map constructed from the topological constraints of the underground roadway identified from the video stream as its cognitive basis, and aims to minimize global travel time and avoid collisions to plan a macroscopic driving path for each moving target. The lower-layer action executor receives the macroscopic driving path output by the upper-layer path planner and, combined with the predicted motion trajectory and attention state information of the surrounding tracked objects in real time, outputs specific deceleration commands, steering commands, or stopping commands.
[0017] The warning instructions sent to the person through the smart wearable device in S5 include: sending personalized warning signals through the vibration module and sound and light alarm built into the smart bracelet or safety helmet. The intensity and mode of the warning signal are dynamically adjusted according to the person's attention state information.
[0018] In S5, the control commands sent to the moving target via the vehicle network include: sending deceleration commands, steering commands, or stopping commands to the vehicle's automatic braking system or control system; wherein, when the collision risk exceeds a preset threshold, the stopping command is executed in emergency stopping mode, and the priority and parameters of the command are calculated and determined in real time based on the collision risk level and the attention status information of the surrounding people.
[0019] The tightly coupled fusion algorithm further includes: when insufficient feature points in the video frame or jitter in the UWB signal is detected, fusing the inertial measurement unit (IMU) data through Kalman filtering to output continuous and smooth position information.
[0020] The generation of attention state information also includes: combining the person's head posture and gaze duration to determine whether they are in a fatigued state, and using fatigue state as an additional dimension of attention state information.
[0021] The training of the spatiotemporal trajectory prediction model further includes: constructing a generative adversarial network (GAN), wherein the generator of the GAN generates a motion feature sequence corresponding to a virtual human-vehicle dangerous interaction trajectory based on real motion feature sequences and attention state information; the virtual motion feature sequence is mixed with the real motion feature sequence to enhance the training of the spatiotemporal trajectory prediction model, so that the spatiotemporal trajectory prediction model learns the motion patterns of people and moving targets in extreme scenarios.
[0022] The beneficial effects of this invention are:
[0023] This invention constructs a state space using a deep reinforcement learning decision model, combining predicted motion trajectories, attention state information, and roadway topological constraints. It defines an action space encompassing vehicle control and personnel early warning, along with a reward function that integrates safety and efficiency, achieving a paradigm shift from passive alarm to proactive decision-making regarding underground personnel-vehicle collision risks. Through a hierarchical reinforcement learning architecture, the upper-level path planner and lower-level action executor collaborate in decision-making, organically combining global path optimization with micro-dynamic risk avoidance. By using a policy network to calculate action Q-values in real time and select the optimal action combination, combined with a closed-loop mechanism of execution state feedback and trajectory re-prediction, it achieves adaptive decision optimization for dynamically changing underground environments. This invention enables mine safety monitoring systems to proactively output tiered control commands and personalized early warning commands before personnel-vehicle collisions occur, improving the inherent safety level of underground personnel-vehicle collaborative operations. Attached Figure Description
[0024] Figure 1 This is a flowchart of a mine safety monitoring method based on an AI-based identification and monitoring system, according to an embodiment of the present invention.
[0025] Figure 2 This is an overall flowchart of a mine safety monitoring method based on an AI-based identification and monitoring system according to an embodiment of the present invention;
[0026] Figure 3 This is a flowchart illustrating a deep reinforcement learning-based active risk avoidance decision-making process according to an embodiment of the present invention. Detailed Implementation
[0027] The technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.
[0028] like Figures 1-3 As shown in the figure, the mine safety monitoring method based on the AI recognition monitoring system of the present invention includes image acquisition equipment, edge computing equipment and smart wearable devices deployed in the mine. The method includes the following steps:
[0029] S1. The video stream of the tracked object in the well is acquired in real time through the image acquisition device; the edge computing device performs image recognition processing on the video stream to obtain the location information of each tracked object and the attention status information of the personnel in the tracked object, wherein the tracked object includes moving targets and personnel.
[0030] In one embodiment of the present invention, the edge computing device in S1 performs image recognition processing on the video stream to obtain the location information of each tracked object, including: acquiring real-time video streams through an explosion-proof binocular camera in the image acquisition device, and simultaneously receiving ranging signals sent by UWB tags carried by the tracked objects through an ultra-wideband UWB base station; employing a tightly coupled fusion algorithm, injecting the ranging signals as constraints into the graph optimization process of visual simultaneous localization and mapping (SLAM), and using the video frame feature point matching results to assist in identifying and eliminating UWB multipath errors, outputting the centimeter-level three-dimensional coordinates and heading angle of the tracked objects as location information.
[0031] In addition, the tightly coupled fusion algorithm also includes: when insufficient feature points in the video frame or jitter in the UWB signal is detected, the inertial measurement unit (IMU) data is fused through Kalman filtering to output continuous and smooth position information.
[0032] In this embodiment of the invention, the target location information calculation is based on a tightly coupled fusion algorithm of UWB ranging and visual SLAM. At the same time, an IMU inertial measurement unit is introduced in combination with Kalman filtering as a positioning completion scheme to solve the problem of reduced positioning accuracy caused by multipath effect of downhole UWB signal and insufficient feature points in video frame. Finally, the output includes the target location information containing centimeter-level three-dimensional coordinates and heading angle. In a specific embodiment of the present invention, the explosion-proof binocular camera can be deployed at key locations such as intersections of underground roadways, working faces, and main equipment travel routes, with a deployment height of 2.5-3 meters. The field of view overlap between adjacent cameras is no less than 30%. The device has a resolution of 1920×1080, a frame rate of 30fps, and meets the Ex d 1 Mb explosion-proof rating. It can collect real-time video streams and output binocular parallax maps. The UWB base station can be deployed in a grid pattern on both sides of the roadway, with a spacing of no more than 50 meters. The operating frequency band is 6.5-7.5GHz. All tracked objects are equipped with miniature UWB tags with a refresh rate of 10Hz, which can send ranging signals to the base station in real time. The IMU inertial measurement unit is a six-axis unit (three-axis acceleration + three-axis angular velocity), integrated with the UWB tags in the same hardware module, with a sampling rate of 100Hz, which can collect inertial motion data of the tracked objects in real time. The edge computing device can be deployed in various underground substations, equipped with a GPU embedded computing unit, which can realize the real-time operation of various algorithms on the edge and uniformly receive and process the data collected by each device.
[0033] The complex metal supports and large equipment obstructing the underground environment can easily lead to multipath effects in UWB signals, causing deviations in the original ranging values. Therefore, it is necessary to identify and eliminate errors in the original UWB ranging signal using the video frame feature point matching results of visual SLAM. Specifically, the edge computing device can use the ORB feature extraction algorithm to perform inter-frame processing on the video stream of the explosion-proof binocular camera. After extracting feature points for each frame, the KNN matching algorithm is used to complete the inter-frame feature point matching, and the feature point matching confidence score is calculated. This confidence score is the ratio of the number of successfully matched feature points to the total number of extracted feature points. The closer the confidence score is to 1, the more reliable the visual tracking result. The feature point matching threshold is set to 0.7. If the calculated confidence level is lower than this threshold, it is determined that the UWB signal is affected by multipath interference, and the original UWB ranging value at that moment is directly discarded. If the confidence level is not lower than this threshold, the ranging value is determined to be valid. The UWB multipath error estimate is obtained by back-calculating the inter-frame motion estimation result based on visual SLAM, and the original ranging value is corrected using this estimate to obtain the corrected valid ranging value.
[0034] Visual SLAM constructs a map of the downhole environment and calculates the pose of the tracked object by estimating the pose of feature points in video frames. This invention injects the corrected UWB ranging values as hard constraints into the graph optimization process of visual SLAM, achieving tight coupling and fusion of the two rather than simple result stitching, thereby improving positioning accuracy. Specifically, this fusion process first completes depth estimation using the disparity map of the binocular cameras. Combined with inter-frame matching of ORB feature points, the initial pose of the tracked object in consecutive frames can be calculated using the optical flow method, while simultaneously constructing the pose nodes and constraint edges of visual SLAM. Then, based on the initial pose values, a total error function containing visual constraints and UWB ranging constraints is constructed, and the UWB ranging constraints are added as ranging edges to the graph optimization network of visual SLAM. Next, using the g2o graph optimization framework, the total error function is minimized using the Gauss-Newton method, and the pose nodes are iteratively optimized until convergence, with a convergence error not exceeding 1e-6, to obtain the optimal pose of the tracked object. Finally, the three-dimensional coordinates of the tracked object are calculated from the optimal pose. The heading angle is obtained by combining the inter-frame heading angle estimation results of the binocular camera. The heading angle estimation accuracy is ±0.5°, and the final output is the initial position information. The positioning accuracy can reach ±3cm, which meets the requirements for centimeter-level positioning in underground wells.
[0035] When extreme situations occur underground, such as strong dust obstruction or complete equipment blockage, the number of feature points extracted from video frames will decrease significantly, or the UWB signal will experience severe jitter due to obstruction. In such cases, the positioning results of visual-UWB fusion will be interrupted, requiring the triggering of a positioning completion scheme combining Kalman filtering and IMU to ensure continuous and smooth output of position information. This step sets two completion trigger thresholds: a threshold of 20 feature points extracted per frame (if the number of feature points is lower than this value, visual SLAM cannot complete effective pose estimation), and a threshold of 0.05m for the UWB signal jitter coefficient (the standard deviation of the ranging values of 5 consecutive sampling points; if the coefficient exceeds this value, it indicates that the UWB signal is unstable). Completion is initiated when either of the two thresholds is met. During the completion process, the fused positioning result of the last frame before completion is used as the initial value of the state vector for the Kalman filter. The initial value of the covariance matrix is set as the identity matrix. The state transition matrix can be calibrated according to the IMU sampling rate. The process noise covariance matrix and the observation noise covariance matrix can be obtained through offline calibration using actual downhole data. Subsequently, a real-time prediction-update iteration is performed according to the state equation and the observation equation. The prior state value at the current moment is obtained from the prediction stage. Then, the prior value is corrected using the inertial data of the IMU as the observation value to obtain the posterior state value. Finally, the smoothed position information after fusion of the IMU is calculated from the posterior state value. The positioning accuracy during the completion stage is ±5cm, which is slightly lower than the positioning accuracy of vision-UWB fusion, but it can effectively ensure the continuity of the positioning results without interruption or jump.
[0036] In one embodiment of the present invention, the edge computing device in S1 performs image recognition processing on the video stream to obtain attention state information, including: acquiring eye images of a person through a high frame rate eye-tracking camera in the image acquisition device; performing eye key point detection and head posture estimation on the eye images to extract the gaze direction vector; transforming the gaze direction vector from the camera coordinate system to the three-dimensional space coordinate system of the tunnel, and calculating the gaze point coordinates of the person in the tunnel space by combining the three-dimensional coordinates in the position information; and generating the person's attention state information based on the positional relationship between the gaze point coordinates and the preset hazard source area, wherein the attention state information includes a focused state and a distracted state.
[0037] In addition, the generation of attention state information also includes: combining the person's head posture and gaze duration to determine whether they are in a fatigued state, and using fatigue state as an additional dimension of attention state information.
[0038] The core of personnel attention state information recognition is high-frame-rate eye tracking combined with head posture estimation and coordinate system transformation. Through refined processing of personnel's eye images, the gaze direction is extracted, and the coordinates of the gaze point in the tunnel space are calculated. Combined with preset hazard source areas, the focus / distraction state of the personnel is determined. Simultaneously, head posture offset angle and gaze fixation duration are integrated to quantify fatigue state and serve as an additional dimension of attention state. The high-frame-rate eye-tracking camera is integrated into the front of the explosion-proof safety helmet of the underground worker. It can be a miniature wide-angle camera with parameters of 1280×720 resolution, 100fps frame rate, 1kHz sampling rate, and a acquisition distance of 10~15cm. The lens is equipped with anti-fog and anti-dust film to ensure the clarity of the underground eye images. The camera is integrated with the helmet's IMU to simultaneously acquire head posture data. An edge computing device uses a lightweight neural network to perform real-time edge processing of the eye images transmitted from the helmet, completing key point detection, posture estimation, coordinate transformation, and other operations, ensuring low processing latency.
[0039] The eye images captured by the underground safety helmet camera suffer from uneven lighting and slight motion blur, requiring preprocessing before extracting key eye points to ensure the accuracy of subsequent gaze direction extraction. Specifically, the edge computing device sequentially performs Gaussian denoising, histogram equalization, and region of interest cropping on the captured eye images, retaining only the core eye area to reduce subsequent computation. Then, a lightweight MediaPipe eye key point detection model is used to extract 468 3D key points of the eye, with 12 core key points, including the pupil center, inner and outer corners of the eye, and iris edge, being selected and their coordinates in the pixel coordinate system are output. The accuracy of this detection method can reach ±1 pixel, which can meet the actual requirements of gaze direction extraction.
[0040] A person's head posture directly affects the judgment of the gaze direction. Therefore, it is necessary to first estimate and compensate for the head posture, and then extract the true gaze direction vector that can represent the person's actual gaze direction. Specifically, head posture estimation is based on facial images captured by a helmet camera. After extracting 68 facial feature points, the PnP algorithm is used in conjunction with the camera intrinsic parameter matrix to calculate the rotation matrix and translation vector of the head relative to the camera coordinate system, thereby obtaining the head posture offset angle. This angle is the angle between the actual head posture and the manually calibrated normal working posture. The normal working posture is when the person's head is looking straight ahead. Subsequently, the pixel coordinates of the pupil center and the corner of the eye are converted into a unit gaze direction vector in the camera coordinate system. This vector is the original gaze direction without considering the head posture. Then, the original gaze direction vector is rotated and transformed by the head rotation matrix to complete the head posture compensation and obtain the true gaze direction vector. This vector can effectively eliminate the influence of head tilting, turning, and tilting on the gaze direction and accurately represent the person's actual gaze direction.
[0041] The gaze direction vector is a vector in the camera coordinate system and needs to be transformed to the three-dimensional spatial coordinate system of the underground roadway (a unified spatial coordinate system in the mine, upon which all equipment is calibrated) in order to accurately calculate the actual gaze point coordinates of personnel in the roadway. Specifically, the intrinsic and extrinsic parameters of the camera are first calibrated. The intrinsic parameters can be calibrated offline using the Zhang Zhengyou calibration method for the safety helmet eye-tracking camera to obtain an intrinsic parameter matrix containing the camera focal length and principal point coordinates. The extrinsic parameters can be calibrated using a three-dimensional model map of the underground roadway to determine the rigid body transformation parameters between the camera coordinate system and the three-dimensional spatial coordinate system of the roadway, including the rotation matrix and translation vector. The extrinsic parameters are updated in real time based on the personnel position provided by the UWB positioning results. The coordinates of the personnel's eyes in the camera coordinate system are then converted to the coordinates in the three-dimensional spatial coordinate system of the tunnel. The coordinates of the personnel's eyes in the camera coordinate system are obtained by estimating the depth of the key points of the eyes within the camera. Finally, the coordinates of the gaze point are calculated by combining the real line-of-sight vector and the line-of-sight depth. The line-of-sight depth is a dynamic value calibrated by the underground environment and will be adjusted according to the working environment of the area where the personnel are currently located. The line-of-sight depth in the tunnel travel area is 5-10 meters, and the line-of-sight depth in the working face is 0.5-2 meters, so as to ensure the authenticity of the gaze point coordinates.
[0042] By comparing the calculated gaze point coordinates with the spatial location of pre-defined hazard areas underground in real time, the system accurately determines a person's focused / distracted state. Specifically, the spatial calibration of hazard areas is first completed. Using underground mining 3D modeling software, hazards such as mobile equipment driving areas, open surfaces, equipment operating areas, and flammable and explosive areas are manually marked, defining the 3D spatial range of each hazard and storing it in the edge computing device's database. The database supports real-time updates, allowing for recalibration of hazard areas after scenarios such as work face movement. After calibration, the system continuously checks whether the gaze point coordinates are within the spatial range of the hazard area. If the gaze point coordinates are within this range, the person is considered focused, with their gaze directed at the hazard area and possessing hazard perception ability. If the gaze point coordinates are outside this range, the person is considered distracted, with their gaze deviating from the hazard area, such as looking down at a phone or looking in an unrelated direction, and lacking hazard perception ability.
[0043] To avoid missed warnings due to relying solely on focus / distraction status, this invention incorporates fatigue as an additional dimension of attention status information, measured by head posture offset angle. and the duration of fixed gaze Quantification is performed to comprehensively cover human-caused risk scenarios. Edge computing devices calculate the three-dimensional angle between the actual head posture and the normal working posture based on head posture estimation results, and take the maximum value of pitch, yaw, and roll angles as the final head posture offset angle. , A higher value indicates a more abnormal head posture; simultaneously, changes in the coordinates of the gaze point are monitored in real time. If the distance the gaze point moves is ≤0.5m within a continuous time period, it is determined that the gaze is fixed and the time is accumulated. If the gaze point moves more than 0.5m, the accumulated time is reset. Then proceed according to the formula. Calculate fatigue quantification value ,in As head pose offset weights, Weighted by the duration of gaze. Set a critical time for line of sight. The closer the value is to 1, the higher the level of employee fatigue. A fatigue threshold is set. ,like If the personnel are determined to be in a state of fatigue, If the concentration / distraction level is not high, it is considered a non-fatigue state. The final concentration / distraction quantification value will then be determined. With fatigue quantification value Feature concatenation is performed to obtain multi-dimensional information on people's attention states. This provides human factor characteristic data for subsequent trajectory prediction and reinforcement learning early warning decision-making. For example, when a person is distracted or fatigued, the system will dynamically increase the intensity of the early warning instruction.
[0044] In this embodiment of the invention, step S1 effectively solves the technical problems of low positioning accuracy and lack of personnel attention perception in existing mine safety monitoring technologies through multi-device collaborative deployment, multi-algorithm fusion processing, and multi-dimensional perception analysis. This step achieves centimeter-level positioning of ±3cm through tight coupling fusion of vision and UWB. Combined with an IMU and Kalman filter completion scheme, it ensures the continuity of positioning results in complex underground environments with obstructions and dust, without interruption or jumps. Regarding personnel attention perception, it not only achieves accurate discrimination of focused / distracted states but also adds fatigue state as an additional dimension, comprehensively covering various human-caused risk scenarios such as distraction and fatigue, avoiding the problem of missed warnings caused by single-state discrimination. Simultaneously, all algorithms are processed in real-time on edge computing devices, with edge processing latency ≤200ms and no cloud transmission latency, meeting the technical requirements for real-time monitoring and rapid early warning in underground mines; the output location information... With attention state information All of these are standardized vectors and can be directly used as input data for spatiotemporal trajectory prediction models.
[0045] S2. The edge computing device performs temporal image analysis on the video stream, extracting motion feature sequences for each tracked object. The motion feature sequences and attention state information are then input into a pre-trained spatiotemporal trajectory prediction model to obtain the predicted motion trajectory of each tracked object within a preset future time period. The spatiotemporal trajectory prediction model is a hybrid model combining a Long Short-Term Memory (LSTM) network and a Transformer. This hybrid model takes the motion feature sequences and attention state information as input, extracts temporal dependency features through LSTM, captures long-distance dependencies through Transformer, and outputs the predicted motion trajectory of each tracked object within the next 3 to 5 seconds.
[0046] In one embodiment of the present invention, the training of the spatiotemporal trajectory prediction model further includes: constructing a generative adversarial network (GAN), wherein the generator of the GAN generates a motion feature sequence corresponding to a virtual human-vehicle dangerous interaction trajectory based on real motion feature sequences and attention state information; the virtual motion feature sequence is mixed with the real motion feature sequence to enhance the training of the spatiotemporal trajectory prediction model, so that the spatiotemporal trajectory prediction model learns the motion patterns of people and moving targets in extreme scenarios.
[0047] In this embodiment of the invention, this step is based on the centimeter-level three-dimensional position information of the tracked object and the multi-dimensional attention state information of the personnel output in step S1. The motion feature sequence of the tracked object is extracted through temporal image analysis. A hybrid spatiotemporal trajectory prediction model combining Long Short-Term Memory Network (LSTM) and Transformer is used to achieve high-precision prediction of the motion trajectory of tracked objects such as underground moving targets and personnel in the next 3 to 5 seconds. Furthermore, the hybrid model is enhanced by extreme scenario training through Generative Adversarial Network (GAN) to solve the technical problem of poor model prediction generalization ability caused by insufficient samples in dangerous interaction scenarios between people and vehicles underground.
[0048] Specifically, this step first completes the extraction and fusion of motion feature sequences. Based on the output data from step S1, basic data preprocessing is performed. A timestamp synchronization algorithm is used to align the 10Hz refresh rate position data with the 30fps video stream, ensuring that one set of position information matches every 10 video frames, eliminating temporal misalignment errors. Then, a sliding window filter with a window size of 5 is used to remove outliers with single-frame position changes exceeding 2m, ensuring data source stability. Finally, all motion features are normalized to avoid biases in model training and inference caused by differences in dimensions. The normalization formula is:
[0049] ;
[0050] in , The characteristic extreme values are calibrated for the actual underground scenario, such as the speed extreme value of 0~5m / s, which corresponds to the speed limit of the underground mobile equipment.
[0051] After preprocessing, in continuous A frame (corresponding to 1 second, calibrated for the underground scene; this duration allows for the extraction of effective temporal features while ensuring real-time prediction) is used as the feature extraction unit, based on the aligned position information. Extracting 6-dimensional core motion features of the tracked object, covering temporal changes in position, velocity, acceleration, and heading angle, the single-frame motion features are represented as follows: Based on the motion features of 10 consecutive frames, a 10×6-dimensional motion feature sequence was constructed. This sequence fully characterizes the continuous motion pattern of the tracked object within 1 second, including the changes in its three-dimensional position. The difference in three-dimensional coordinates between frame t and frame t-1 reflects the instantaneous displacement direction of the tracked object. In underground roadways, horizontal movement is dominant. (Height) changes typically approach 0, allowing the model to automatically adapt to the scene's characteristics; instantaneous velocity This is the ratio of the displacement distance in frame t to the time interval (0.1 seconds). Instantaneous acceleration This is the ratio of the velocity difference between frame t and frame t-1 to the time interval. It reflects the acceleration and deceleration state of the tracked object (such as braking of mobile devices or sudden stopping of personnel); the change in heading angle. Let be the difference in heading angle between frame t and frame t−1. It reflects the turning state of the tracked object (such as mobile devices changing lanes or people suddenly turning around).
[0052] Finally, the personnel attention state information output in step S1 is used... As a global feature, it is related to the motion feature sequence. Column concatenation and fusion are performed to form the final input sequence of the hybrid model. The fusion formula is as follows: The fused input sequence is 10×8-dimensional (10 frames × 6-dimensional motion features + 2-dimensional attention features).
[0053] For moving targets lacking attentional state information, Set to (1,0) (default no distraction, no fatigue). For personnel, input their current status in real time. The model will automatically learn the correlation between attentional states and motion characteristics, for example: distracted individuals ( )of Small Small fluctuations, characterized by "slow linear movement with no turning response"; fatigued personnel ( )of Approaching 0, it manifests as "uniform motion with no acceleration or deceleration response".
[0054] This fusion approach allows the prediction model to not only predict trajectories based on motion patterns, but also to adjust the prediction results by incorporating the human factors involved, thus significantly improving the accuracy of personnel trajectory prediction.
[0055] Based on the fused input sequence, this invention uses an LSTM+Transformer hybrid model to predict trajectory. This model is a serial architecture, adapted for lightweight downhole scenarios. Its core parameters are calibrated based on edge computing capabilities and downhole prediction requirements. The LSTM layer has two layers and a hidden layer dimension. Batch size = 8, Transformer layer has 1 encoder layer, multi-head count The feedforward dimension is 128, with 2 linear layers and an output dimension of 3. This reduces computational cost at the edge while maintaining accuracy in feature extraction and prediction. Input sequence First, the input is fed into the LSTM layer, which is then passed through the input gate. Forgotten Gate Output gate The gating mechanism extracts short-term temporal dependency features, according to the formula. Complete gating and cell state Calculate and output a 10×64-dimensional hidden state sequence. Each hidden state It contains the correlation information between the motion features of frame t and the historical temporal features, providing the temporal feature basis for the Transformer layer, where... Here, is the Sigmoid activation function, and tanh is the hyperbolic tangent activation function. It is an element-wise product. Let t be the hidden state of the LSTM at time t, i.e., the extracted temporal dependency features. Here are the input / hidden layer weight matrices for each gate. This is the bias term. The hidden state sequence is then input into the Transformer layer, which captures long-distance dependencies through a multi-head self-attention mechanism. First, H is passed through three independent linear layers to obtain the query vector Q, key vector K, and value vector V. Then, according to the formula... Calculate single-head self-attention using the formula. After concatenating and linearly transforming the multi-head results, and processing them through layer normalization and a feedforward neural network, a 1×64-dimensional global comprehensive feature is output. This feature integrates the short-term temporal motion patterns and long-distance motion dependencies of the tracked object, whereby... Let be the dimension of the key vector. To use a scale factor to avoid excessively large inner products, This is the output weight matrix for multi-head attention. Finally, the globally synthesized features are input into the linear output layer, according to the formula... Solve and output the predicted motion trajectory ,in The output weight matrix is (64×3). The output bias term (1×3) is obtained through offline pre-training and online fine-tuning. Let t represent the predicted 3D coordinates at second t. The model supports configurable predictions from 3 to 5 seconds. Edge computing devices can automatically switch based on the risk level of the underground area. High-risk areas such as roadway intersections and working faces are predicted at 5 seconds, while medium- and low-risk areas such as straight roadways and unmanned working areas are predicted at 3 seconds, balancing risk avoidance response time with model inference speed.
[0056] To address the issue of decreased model prediction accuracy due to insufficient samples in extreme scenarios of dangerous human-vehicle interactions underground, this invention constructs a Generative Adversarial Network (GAN). Using real motion feature sequences and attention state information as conditions, it generates motion feature sequences corresponding to virtual human-vehicle dangerous interaction trajectories. The virtual sequences are mixed with real sequences as a training set to enhance the training of the LSTM+Transformer hybrid model. This enables the model to learn the motion patterns of personnel and moving targets in extreme scenarios, significantly improving the trajectory prediction accuracy in extreme scenarios.
[0057] A Generative Adversarial Network (GAN) consists of a generator (G) and a discriminator (D), which operate in a zero-sum game relationship. When trained to Nash equilibrium, the generator can produce virtual sequences that are highly similar to the feature sequences of real dangerous scenes, and the discriminator cannot distinguish between real and virtual sequences. The generator G employs a deconvolution and LSTM architecture to generate virtual sequences based on real motion feature sequences. Attention state information Taking random noise z (random noise enhances the diversity of the virtual sequence) as input, the output is a virtual dangerous motion feature sequence. The training goal is to make To approximate the true dangerous sequence as closely as possible, the discriminator D employs a convolutional and fully connected architecture. It takes either a real motion feature sequence or a virtual dangerous motion feature sequence as input and outputs the probability that the sample is a real sequence. The training objective is to accurately distinguish between real and virtual sequences.
[0058] GAN training is divided into offline pre-training and online reinforcement training. Offline pre-training can be completed on a training server on the mine ground. Specifically, normal motion data and a small amount of real dangerous motion data (such as people suddenly entering the lane or mobile devices braking suddenly) are collected in the mine to build a real dataset. After randomly initializing the weights of the generator G and discriminator D, they are trained alternately according to the formula. Calculate generator loss According to the formula Calculate the discriminator loss The weights of the two sequences are updated separately using gradient descent. When the discriminator's recognition accuracy stabilizes at around 50% (unable to distinguish between real and virtual sequences), it is determined that Nash equilibrium has been reached and training stops. For the expected operation, This represents the distribution of the true motion feature sequence.
[0059] Online reinforcement training is completed on edge computing devices, transforming the virtual dangerous motion feature sequences generated by GANs. Compared with real motion feature sequences Mix them at a ratio of 1:4 to construct an enhanced training set. The LSTM+Transformer hybrid model is fine-tuned online at the edge computing device. Specifically, the mean squared error is used as the loss function, according to the formula... Measure the deviation between the predicted trajectory and the real / virtual trajectory. The smaller the value, the higher the prediction accuracy. A batch of 32 samples is used, and the hybrid model is fine-tuned online with a small learning rate of 1e-5. Furthermore, the edge computing device performs a daily fine-tuning of the model using real-time motion data from underground, ensuring the model adapts to real-time changes in underground motion scenarios. After GAN-enhanced training, the trajectory prediction accuracy of the hybrid model in extremely dangerous scenarios is improved, accurately predicting the motion trajectories of distracted individuals entering lanes, mobile devices making emergency lane changes and braking, and fatigued individuals moving slowly and irregularly.
[0060] The original predicted trajectory output by the hybrid model after GAN enhancement training may contain minor jumps (such as excessive coordinate changes between adjacent seconds) or exceed the topological constraints of the underground tunnel (such as the predicted trajectory crossing the tunnel wall). Post-processing using edge computing devices is necessary to ensure the smoothness of the predicted trajectory and the reasonableness of the scene, ultimately outputting a standardized predicted motion trajectory. Specifically, post-processing includes three stages: trajectory smoothing, tunnel topological constraint correction, and standardized output. Trajectory smoothing can employ cubic spline interpolation, using the 3D coordinates of the original predicted trajectory as interpolation nodes to construct a cubic spline function. The coordinate values are recalculated every 0.1 seconds for the next 3-5 seconds, and then averaged per second to eliminate minor jumps in the original trajectory, ensuring the trajectory conforms to the actual motion patterns of the tracked object underground. Tunnel topological constraint correction combines the underground tunnel topological constraints identified from the video stream. First, determine whether each three-dimensional coordinate of the predicted trajectory is within the effective space of the roadway, such as the width of the driving lane. Internal clearance height Distance from the obstacle If the predicted trajectory exceeds the safety threshold, and if it exceeds the effective space of the tunnel, the shortest path method is used to correct the trajectory, extending it along the effective space of the tunnel and as close as possible to the original predicted trajectory, preserving the original movement trend of the tracked object. After smoothing and correction, the edge computing device outputs a standardized predicted motion trajectory. All three-dimensional coordinates in the trajectory are centimeter-level coordinates in the three-dimensional spatial coordinate system of the underground tunnel, consistent with the coordinate system of the position information output in step S1. This allows for direct splicing with the state space of step S3, achieving seamless integration with the deep reinforcement learning decision model.
[0061] This step addresses the technical challenges of insufficient temporal dependency capture, missing extreme scenario samples, and poor real-time performance in downhole trajectory prediction by combining an LSTM+Transformer hybrid model with GAN extreme scenario enhancement training. It also achieves deep fusion of human characteristics and motion features, enabling trajectory prediction to not only be based on the movement patterns of the tracked object but also to incorporate corrections based on personnel distraction and fatigue, significantly improving the accuracy of personnel trajectory prediction. The lightweight, adapted model achieves an inference latency of ≤100ms on edge computing devices, combined with a prediction duration of 3-5 seconds, providing ample reaction time for subsequent proactive risk avoidance decisions.
[0062] S3. Edge computing devices construct a deep reinforcement learning decision model, which includes a state space, an action space, and a reward function. The state space consists of predicted motion trajectories, attention state information, and topological constraints of the underground roadway identified from the video stream. The action space consists of deceleration commands, steering commands, and stopping commands for moving targets, as well as audible and visual warning commands and vibration warning commands for personnel. The reward function includes a collision penalty term, a safe distance maintenance reward term, and a traffic efficiency penalty term.
[0063] In one embodiment of the present invention, the deep reinforcement learning decision model constructed by the edge computing device in S3 adopts a hierarchical reinforcement learning architecture, including an upper-layer path planner and a lower-layer action executor. The upper-layer path planner uses the tunnel topology map constructed from the tunnel topology constraints identified from the video stream as the cognitive basis, and plans a macroscopic driving path for each moving target with the optimization objective of minimizing global travel time and avoiding collisions. The lower-layer action executor receives the macroscopic driving path output by the upper-layer path planner, and combines it with the predicted motion trajectory and attention state information of the surrounding tracked objects perceived in real time, and outputs specific deceleration commands, steering commands, or stopping commands.
[0064] This step is used to achieve accurate decision-making and proactive risk avoidance for collisions between people and vehicles underground. Based on the personnel attention state information output from step S1 and the predicted motion trajectory of the tracked object output from step S2, combined with the underground roadway topological constraints obtained from video stream image recognition, a deep reinforcement learning decision-making model adapted to complex underground mining scenarios is constructed. This model innovatively adopts a hierarchical reinforcement learning architecture with an upper-level path planner and a lower-level action executor. By accurately defining the state space, action space, and reward function, a complete decision-making model framework is formed. All model construction processes, algorithm design, and parameter calibration are completed on edge computing devices. Furthermore, it has been lightweighted and customized for underground scenarios, balancing decision accuracy, real-time performance, and engineering practicality. It effectively solves the technical problems of poor decision generalization ability and imbalance between safety and traffic efficiency in traditional reinforcement learning models in underground topological environments.
[0065] Specifically, considering the characteristics of underground roadways with fixed topology, narrow spaces, and strong physical constraints on pedestrian and vehicle movement, this invention abandons the traditional single-layer reinforcement learning architecture and adopts a hierarchical reinforcement learning architecture with an upper-layer path planner and a lower-layer action executor. The two layers each perform their respective functions and make collaborative decisions. The hardware is deployed on edge computing devices, achieving real-time information exchange through an internal data bus. The overall decision latency is ≤50ms, meeting the real-time requirements of active risk avoidance in underground environments. The upper-layer path planner is a slow decision layer. Based on the topological constraints of the underground roadway obtained from video stream image recognition, it constructs a gridded lightweight topological map of the underground roadway. This map is not a full 3D model of the underground roadway, but rather a gridded map (0.5m × 0.5m grid size, suitable for centimeter-level positioning accuracy) formed by extracting the core topological features of the roadway (lane boundaries, obstacle locations, roadway intersections, and traffic direction restrictions), significantly reducing computational load. This layer aims to minimize global travel time and avoid global collision risks, employing an improved A / B algorithm. * The algorithm plans a macroscopic driving path for a moving target, in traditional A * By adding a collision risk cost term to the algorithm's cost function, the optimized cost function becomes:
[0066] ;
[0067] in The actual cost from the starting point to the current node. The heuristic cost from the current node to the destination. This represents the collision risk value of the current node. The risk cost weights, defined for the underground scenario, are used to balance the optimization objectives of traffic efficiency and risk avoidance.
[0068] The final output consists of a macroscopic driving path composed of grid nodes. It provides global driving constraints for the execution of lower-level actions, among which For each grid node in the tunnel topology map, the path satisfies the underground rules of "traveling along the roadway, avoiding high-risk areas, and using the shortest route".
[0069] The lower-level action executor is a fast decision layer. It receives the macroscopic driving path output by the upper-level path planner, the predicted motion trajectory of all tracked objects for the next 3-5 seconds output by step S2, the personnel attention state information output by step S1, and the dynamic environment information of the underground mine recognized in real time from the video stream. With real-time avoidance of microscopic collision risks as the core optimization objective, it adopts a Deep Q-Network (DQN) optimized for underground scenarios as the core algorithm. At the same time, it introduces an experience replay pool and a target network to avoid oscillation and overfitting problems during model training. Finally, it outputs refined control actions for moving targets and personalized early warning actions for personnel as a candidate set of actions for subsequent intervention decisions.
[0070] In this embodiment of the invention, a two-layer architecture forms a closed-loop collaborative decision-making logic. The upper-layer path planner is a slow decision-making layer, updating the macroscopic travel path every 5 seconds based on the underground roadway topology and the global environment to ensure the rationality of the global decision. The lower-layer action executor is a fast decision-making layer, updating the microscopic action instructions every 100ms to achieve rapid response to the dynamic environment. When the lower layer detects a sudden collision risk, such as personnel suddenly entering the macroscopic path of a moving target, it can temporarily break through the macroscopic path constraints to execute emergency avoidance actions, such as emergency stopping, and send a path update request to the upper layer. Upon receiving the request, the upper layer immediately replans the macroscopic path, realizing dynamic decision-making of global planning, microscopic execution, sudden feedback, and path update, adapting to the dynamic and complex environment of underground roadways.
[0071] The state space is the input foundation of a deep reinforcement learning decision-making model, representing the state of all environments and objects underground at the current moment. The model learns the features of the state space and outputs corresponding action decisions. The state space of this invention is composed of three core parts: the predicted motion trajectory of the tracked object output in step S2, the personnel attention state information output in step S1, and the underground roadway topological constraints from video stream recognition. A single global state is denoted as... The state space is constructed following the principles of full perception, accurate representation, and scene adaptation, comprehensively covering all key influencing factors of underground personnel and vehicle collision risks. Furthermore, feature fusion optimization has been performed for scenarios with multiple personnel and vehicles coexisting underground, solving the problem of state representation for multiple tracked objects. The predicted motion trajectory of the tracked object is a three-dimensional coordinate sequence for the next 3-5 seconds, which is the core feature for judging collision risk. Personnel attention state information is a two-dimensional vector containing quantified values of focus / distraction and fatigue, representing the human-caused risk characteristics of personnel's hazard perception ability. The underground roadway topological constraints are three-dimensional vectors identified through video stream recognition. ,in To track the distance (m) from the object to the nearest obstacle. This represents the current width of the roadway (m). The tunnel clearance height (m) represents the physical constraints underground, limiting the range of motion and amplitude of the tracked object.
[0072] Because the dimensions and numerical ranges of each feature differ greatly, such as the predicted trajectory coordinates being on the order of hundreds of meters, the attention state value being 0 to 1, and the tunnel width being on the order of several meters, in order to avoid the model overfitting to large numerical features, all features need to be normalized first, and the values need to be uniformly mapped to the [0,1] interval.
[0073] In an underground scenario, there are multiple moving targets and personnel. It is necessary to fuse the state features of all tracked objects to form a unified global state vector. Specifically,
[0074] First, predict the motion trajectories of all moving targets. Concatenate into a moving target feature matrix Where m is the number of moving targets, the attention states of all personnel are then concatenated with the predicted motion trajectories to form a personnel state feature matrix. Where h represents the number of personnel, and "+" indicates feature concatenation. Finally, the moving target feature matrix, the personnel state feature matrix, and the roadway topology constraints are concatenated column-wise to form a unified global state vector, the mathematical expression of which is: All state vectors are stored in real time in the state cache pool of the edge computing device. The cache pool capacity is set to 1000 and is updated in real time according to timestamps, providing real-time and effective state input for subsequent Q-value calculation.
[0075] Action space is the output candidate set of a deep reinforcement learning decision model, representing all actions that the model can perform in the current state. In this invention, the action space is divided into a subset of moving target actions based on the different objects acting upon it. and personnel action subset The overall movement space is Furthermore, all actions are defined and quantized to adapt to the discrete action decision-making requirements of deep Q-networks. The parameter range of the actions is customized and calibrated according to the physical constraints of the underground roadway and the execution capabilities of the hardware equipment to ensure the executability of the actions underground. Each action corresponds to the specific hardware execution instructions of the Internet of Vehicles and smart wearable devices, realizing the seamless connection between model decision-making and hardware execution.
[0076] It should be noted that the action space is divided into two non-overlapping subsets according to the object of action, corresponding to the active control actions of the moving target and the active warning actions of personnel, respectively. The actions of the two subsets can be executed independently or in combination (such as deceleration of the moving target + audible and visual warning of personnel). The combined actions are called action combinations, which cover all the scenario requirements for personnel and vehicle safety in underground mines.
[0077] The moving target action subset includes three core control actions: deceleration, steering, and stopping. These are all actions that can be directly executed by the moving target's automatic braking / control system. Among them, the deceleration command... For continuous value quantization, the encoding range Characterizing the deceleration rate of a moving target, such as This indicates a 50% reduction in speed. In underground scenarios, the minimum reduction ratio is 0.1 to prevent slight deceleration from having no safety effect. The maximum reduction ratio is 0.9, with a 10% idle speed allowance to prevent complete stall. (Steering command) For continuous value quantization, the encoding range This characterizes the turning radius of the moving target; a positive value indicates a right turn, and a negative value indicates a left turn. The maximum turning angle is set at 30° to prevent the equipment from veering into the lane or colliding with the lane wall due to excessive turning angle; Stop command. For binary discrete quantization, the encoding is as follows: 0 indicates no stopping action, and 1 indicates stopping action. The 1 option is further divided into regular stopping and emergency stopping, determined based on the collision risk level. The personnel action subset includes two core warning actions: audible and visual warnings and vibration warnings. These are actions that can be directly executed by smart wearable devices such as smart bracelets and explosion-proof helmets worn by personnel. Audible and visual warning commands... Vibration warning command All are binary discrete quantization, and the encoding is... 0 indicates that the corresponding warning is not executed, and 1 indicates that the corresponding warning is executed. Specific parameters such as the frequency, volume, and amplitude of the warning can be dynamically adjusted according to the user's attention level. The action space is a discrete finite set, and a single action... The mathematical expression for the quantized and encoded vector is:
[0078] ;
[0079] ;
[0080] The quantized encoding values of all actions are stored in the action encoding table of the edge computing device, providing a standardized basis for subsequent Q-value calculation and hardware execution.
[0081] The reward function is the core of optimization in deep reinforcement learning decision-making models, determining the model's learning direction. This invention constructs a reward function based on the core operational principle of "safety first, efficiency second" in underground mining, including a collision penalty term. Safe distance maintenance reward items Traffic efficiency penalty items The three core components are weighted and fused to obtain the total reward function, which guides the model to gradually converge towards the optimal decision direction of "no collisions, maintaining a safe distance, and balancing traffic efficiency" during the learning process. The mathematical expression of the total reward function is as follows:
[0082] ;
[0083] in The negative sign indicates the penalty, and the largest absolute value reflects the priority of safety. To maintain the weight of reward items for safe distancing, a positive sign indicates a reward, guiding positive behavior; The weights represent the traffic efficiency penalty, with a negative sign indicating a penalty. The minimum absolute value is chosen to balance efficiency. All weights are calibrated through offline training and actual field testing using historical downhole risk avoidance data, representing empirically optimal values. Furthermore, they can be dynamically adjusted based on the risk level of the downhole area, achieving scenario-based adaptation of the reward function.
[0084] Specifically, the collision penalty term is a binary discrete penalty, representing the degree of punishment for a collision between a person and a vehicle or a risk of collision after the model performs an action. Its mathematical expression is:
[0085] ;
[0086] in The actual distance between tracked objects after the action is performed is calculated from the predicted motion trajectory. The minimum safe distance between personnel and vehicles underground is determined according to the mine safety regulations; when there is a risk of collision, the model will be penalized to the maximum extent, guiding the model to actively avoid such actions.
[0087] The safe distance maintenance reward is a continuous value reward, representing the degree of reward for maintaining a safe distance between people and vehicles after the model performs an action. Its mathematical expression is:
[0088] ;
[0089] This formula uses a logarithmic function to ensure that the reward value varies with the actual distance. The increase is gradual and progressive, avoiding the model's excessive pursuit of a large safety distance at the expense of downhole access efficiency, thus providing a positive incentive for proactively maintaining a safety distance. When hour, As a basic reward; when The larger, The closer it is to the maximum value.
[0090] The passage efficiency penalty term is a continuous value penalty, representing the degree of punishment for the impact of the model's actions on downhole passage efficiency. Its mathematical expression is:
[0091] ;
[0092] in The desired travel speed for a moving target underground is determined by the speed limit standards for different roadways, such as straight roadways. Intersection ; The actual speed at which the target moves after the action is performed. hour, No punishment; when Deviation The larger, The larger the penalty value (the larger the absolute value of the negative value), the more targeted the penalty will be for actions such as excessive deceleration and meaningless stopping, so as to achieve a balance between underground safety and traffic efficiency.
[0093] The weighting coefficients of the reward function can be dynamically adjusted for different risk levels in underground areas. Adjustments can be made in high-risk areas such as roadway intersections and working faces. , , Further increase the weighting of safety-related factors and decrease the weighting of efficiency factors; in low-risk areas such as unmanned operation tunnels, adjust... , , Appropriately increase the weight of efficiency and decrease the weight of safety to achieve refined decision-making that prioritizes safety in high-risk areas and efficiency in low-risk areas.
[0094] In this invention, the completed deep reinforcement learning decision-making model needs to undergo offline pre-training and online fine-tuning before being deployed to edge computing devices to ensure the model's adaptability and decision-making accuracy in real-world underground scenarios. The offline pre-training is performed on a training server on the mine surface, using historical underground human and vehicle movement data and hazard avoidance event data to construct a standardized training set for offline pre-training until the model's Q-value loss is <0.01, indicating that the model's loss function has converged, thus obtaining pre-training weights adapted to the underground scenario. In the online fine-tuning stage, the pre-training weights are transferred to the edge computing device, using real-time underground human and vehicle status data for small-batch online fine-tuning. The learning rate is set to 1e-5, and the batch size to 32, allowing the model to quickly adapt to the real-time scene characteristics and movement patterns underground. To adapt to the computing power and storage resources of edge computing devices, the model needs to be lightweighted and optimized. Specifically, the network structure of the deep Q-network is simplified, retaining only two hidden layers with a dimension of 128 for each layer; the experience replay pool capacity is compressed to 5000, storing only the most recent decision experience to reduce the computation of invalid data; and model quantization technology is used to convert the model's floating-point parameters into integer parameters, significantly reducing the model's storage footprint and computational load. The lightweight and optimized deep reinforcement learning decision model has a single decision time of ≤50ms on the edge computing device, fully meeting the real-time requirements of active risk avoidance in underground mines, and can stably achieve real-time decision-making on the status of personnel and vehicles underground.
[0095] S4. The edge computing device inputs the state space of all tracked objects at the current moment into the deep reinforcement learning decision model. The deep reinforcement learning decision model calculates the Q value of each action according to the preset policy network and selects the action combination that maximizes the Q value as the intervention decision at the current moment.
[0096] This step involves inputting the current underground global state space into a deep reinforcement learning decision-making model. The model calculates the Q-value (action value function) for each action and action combination using a policy network, specifically a Deep Q-Network (DQN). Based on the quantified value assessment, the action combination maximizing the Q-value is selected as the optimal intervention decision for the current moment. All calculations in this step are performed on edge computing devices. Through lightweight inference optimization of the policy network, temporal difference (TD) updates for Q-value calculation, and global traversal filtering of action combinations, an edge-side decision latency of ≤50ms is achieved, adapting to the real-time risk avoidance needs of dynamically changing personnel and vehicles in underground mines. Simultaneously, the decision-making process strictly adheres to the constraints of a hierarchical reinforcement learning architecture. The micro-action decisions of lower-level action executors are always based on the macro-paths of upper-level path planners, ensuring that the decisions possess both global rationality and micro-level accuracy.
[0097] Specifically, before this step is executed, the policy network needs to be adapted to the edge and the state input needs to be prepared. First, the lightweight DQN policy network designed for the lower-level action executor in step S3 is optimized during the inference phase. Redundant modules in the training phase are removed, and only the core forward inference link of input layer + 2 hidden layers + output layer is retained. The dimension of the input layer matches the actual dimension of the current global state vector, the dimension of the hidden layers is 128, and the dimension of the output layer is the number of 5 single actions defined in step S3. The output layer uses a linear activation function to directly output the Q value of the single action. At the same time, the 32-bit floating-point parameters are converted to 8-bit integer parameters using model quantization inference technology, and training modules such as batch normalization and Dropout are turned off. The GPU embedded computing core deployed on the edge computing device is used with single-threaded forward inference, and the time for a single inference is ≤10ms. Then, the global state vector of the current timestamp (time t) is extracted from the state cache pool built in step S3. The system sequentially performs validity preprocessing, including timestamp verification, eigenvalue validity verification, and topological constraint verification, to obtain a valid global state vector. The macroscopic driving path of the moving target output by the upper-level path planner in step S3 will be used. As a hard constraint mask embedded in the state vector, the features related to the direction of motion of the moving target are weighted by 1.5 times, so that when the policy network calculates the Q value, it will prioritize the action consistent with the macro path and avoid the micro decision deviating from the global path.
[0098] Effective global state vector After inputting the DQN policy network at the input end, the forward inference of the single-action Q-value and the update of the target Q-value are first completed, realizing the quantization calculation of the Q-value and the dynamic optimization of the network parameters. The current Q-value is obtained through the forward calculation of the policy network, and the core calculation formula is:
[0099] ;
[0100] in , Here are the weight matrix and bias terms for the first hidden layer. , The weight matrix and bias terms of the output layer. It is the ReLU activation function. The formula provides learnable parameters for the policy network. It directly outputs the current Q-values of five individual actions: deceleration, steering, stopping, audible and visual warning, and vibration warning, forming a single-action Q-value vector. ,in The current Q value for the deceleration action. The current Q value for the steering action. This represents the current Q value of the parking action. This represents the current Q value of the audible and visual warning action. This is the current Q value for the vibration warning action.
[0101] To avoid training oscillations during policy network parameter updates, this invention employs a dual-network structure of a policy network (online network) and a target network (fixed network) to calculate the target Q-value. The core formula is:
[0102] ;
[0103] in Current state The immediate reward after performing action a; This is a discount factor, representing the degree of decay in future rewards, and is calibrated to 0.95; Based on the current state The predicted global state vector at time t+1 (100ms later) allows for early detection of dynamic changes in the downhole state. For the target network parameters, State at the next moment The maximum Q value of all single actions represents the optimal future reward that can be obtained in the next moment after executing the current action a.
[0104] Based on the deviation between the target Q-value and the current Q-value, the policy network parameters are determined using mean squared error loss. For small-batch online micro-updates, the loss function is calculated as follows:
[0105] ;
[0106] in For the number of single actions, It is a 2-norm.
[0107] The stochastic gradient descent (SGD) algorithm is used with a very small learning rate. (Downhole scene calibration) for loss values Calculate the gradient and update the policy network parameters along the gradient descent direction. Simultaneously, after each policy network parameter update, the target network parameters are synchronously updated according to the soft update formula. :
[0108] ;
[0109] Among them, soft update rate This ensures that the target network parameters change slowly, making the calculation of the target Q value more stable.
[0110] After each decision is made, the state-action-reward-next state quadruple is generated. The data is stored in the experience replay pool constructed in step S3, and expired data in the pool is removed according to the first-in-first-out principle to provide data support for subsequent batch updates of the policy network.
[0111] Since a single action is insufficient for effective hazard avoidance in actual underground hazard avoidance scenarios, a subset of the moving target's actions is required. subset of personnel actions The actions are combined to form human-vehicle collaborative action combinations, and the total Q-value is calculated to provide a quantitative basis for selecting the optimal action combination. For a basic single-person vehicle scenario with one moving target and one person, the number of action combinations is 3 × 2 = 6. The total Q-value of the combinations is calculated using a weighted sum formula, which is:
[0112] ;
[0113] in , (Underground scene calibration) embodies the principle of risk avoidance with active control of moving targets as the core and early warning of personnel as an auxiliary.
[0114] For complex scenarios involving multiple moving targets and multiple people, the first step is to group people and vehicles according to their collision risk correlation using a distance clustering algorithm, and then group those whose predicted trajectory distance to the same moving target is less than the underground safety distance. Personnel are divided into groups, forming multiple "moving target-associated personnel" groups. The total Q-value for each group is calculated independently for six action combinations. The global total Q-value is the mean of the total Q-values for all group combinations. If a moving target has no associated personnel, only its associated personnel are considered. The Q-values of the three individual actions are calculated, and the maximum value is taken as the total Q-value of the group. If a person has no associated moving target, only their individual Q-value is calculated. The Q-values of two individual actions are taken and the maximum value is taken as the total Q-value of the group. This grouping rule fits the actual risk avoidance needs of "whoever is at risk, make decisions accordingly" in the mine.
[0115] After calculating the total Q-value of all action combinations, the optimal action combination is screened and validated a second time to ensure that the decision meets both the model's value assessment and downhole physical constraints and safety rules. First, all action combinations are traversed globally, based on the formula... The action combination with the largest total Q value is extracted as the initial optimal intervention decision. To reduce computational cost, a greedy traversal algorithm can be used to iterate through only the 6 action combinations within each group, thus reducing computational complexity from... Down to Subsequently, a two-layer quadratic verification is performed on the initial optimal decision. The first layer is the underground physical constraint verification, which verifies whether the turning angle is ≤π / 6 (30°, due to the narrow constraints of the underground roadway), whether the predicted position of the moving target after the turn is within the width of the driving lane, whether the deceleration / stopping action matches the braking performance of the mobile device, and whether the smart wearable device corresponding to the warning action is online. If the constraints are not met, the combination is eliminated and the action combination with the second largest Q value is selected. The second layer is the collision risk secondary assessment, which substitutes the initial optimal decision into the spatiotemporal trajectory prediction model in step S2, re-predicts the movement trajectory of the person and vehicle in the next 3-5 seconds after the action is performed, and calculates the actual distance between the person and the vehicle. ,like If the decision effectively avoids the collision risk, it will be retained. If the emergency decision-making mechanism is triggered, the combination of stopping action + sound and light warning + vibration warning will be directly selected as the final decision. In high-risk areas such as roadway intersections and working faces, the Q-value traversal will be skipped directly, and the emergency decision-making mechanism will be triggered by default to ensure underground safety.
[0116] Ultimate optimal intervention decision Once determined, standardized output and caching operations are performed. First, the final optimal intervention decision is converted into a digital encoding format consistent with the action quantification encoding in step S3, forming a standardized decision instruction set that can be directly recognized by underground vehicle-to-everything (V2X) and smart wearable devices. Each position in the encoding corresponds sequentially to the deceleration ratio, steering angle, stop command, audible and visual warning, and vibration warning; the numbers directly represent the action execution parameters. Then, the standardized decision instruction set is bound to the current timestamp, underground scene information, and collision risk level, and stored in the decision cache pool of the edge computing device. The cache pool has a capacity of 10,000 entries and a retention time of 72 hours. This allows for the retrieval and resending of the latest decision instructions in case of data transmission interruption during device execution, achieving decision fault tolerance; it also provides historical decision data for mine safety management, enabling decision process traceability and providing data support for mine safety analysis and model parameter optimization. Finally, the standardized decision instruction set is pushed in real-time to the V2X control module and smart wearable device communication module in step S5 via the internal data bus of the edge computing device, with a push latency of ≤5ms, ensuring that proactive risk avoidance operations can be executed immediately.
[0117] The status of personnel and vehicles, as well as environmental characteristics, are dynamically changing in real time in mines. For example, personnel movement, equipment lane changes, and the appearance of temporary obstacles all occur. This step is not a single decision-making process, but rather a continuous iterative decision-making process with a 100ms cycle. The core iterative logic is: extract the new state → calculate the new Q-value → select the new action → output the new decision, forming a closed-loop decision-making process of state perception – decision generation – action execution – state update, adapting to the dynamic real-time changes in the status of personnel and vehicles, and environmental characteristics underground. Simultaneously, this decision-making process supports dynamic adaptation to the underground scene. The edge computing device dynamically adjusts the weight of the Q-value calculation based on the risk level (high / medium / low) of the underground area identified by the video stream: increasing the Q-value weight of safety actions such as stopping and emergency deceleration in high-risk areas. The threshold has been adjusted to 0.8, reducing penalties related to traffic efficiency and prioritizing safety; medium-risk areas remain unchanged. , The default weights are used to balance safety and traffic efficiency; in low-risk areas, the Q-value weights of efficiency-oriented actions such as turning and slight deceleration are increased, while the Q-value of stopping actions is reduced, prioritizing the efficiency of downhole operations and ensuring that decision-making behavior is aligned with the risk characteristics of different downhole areas.
[0118] S5. Based on the intervention decision, send corresponding control commands to the moving target through the vehicle network, and at the same time send corresponding early warning commands to the personnel through smart wearable devices to perform active risk avoidance operations.
[0119] Among them, the warning instructions sent to personnel through smart wearable devices include: sending personalized warning signals through the vibration module or sound and light alarm built into smart bracelets or safety helmets. The intensity and mode of the warning signal are dynamically adjusted according to the personnel's attention status information.
[0120] Control commands sent to moving targets via the Internet of Vehicles include: sending deceleration commands, steering commands, or stopping commands to the vehicle's automatic braking system or control system; among them, when the collision risk exceeds a preset threshold, the stopping command is executed in emergency stopping mode, and the priority and parameters of the command are calculated and determined in real time based on the collision risk level and the attention status information of surrounding personnel.
[0121] This step is used to transform the standardized optimal intervention decision instruction set output from step S4 into precise active control actions for downhole moving targets and personalized early warning actions for personnel's smart wearable devices through a dedicated downhole communication and control system. At the same time, it constructs a closed-loop mechanism for the entire process, including real-time feedback of execution status, dynamic correction of deviations, and emergency handling of anomalies, to ensure the timeliness, accuracy, and effectiveness of risk avoidance operations.
[0122] Before executing hazard avoidance operations, preliminary safeguards such as instruction parsing and equipment status pre-inspection must be completed. The edge computing device has a built-in instruction parsing engine that, according to the action quantification coding rules agreed upon in steps S3 and S4, reverse-parses the standardized digital code decision instructions into specific, actionable parameters. The parsing process fully preserves accuracy information and adapts to the execution resolution of the downhole equipment. The parsed action parameters are then encapsulated into equipment-specific instruction packages, with moving targets corresponding to vehicle control instruction packages and personnel wearable devices corresponding to warning instruction packages. All instruction packages carry a unique identifier, timestamp, and execution priority. Emergency stop and dual warnings are set to the highest priority of 1, while regular deceleration and single warnings are set to priority of 2, ensuring the orderly execution of instructions when multiple devices in the downhole are running in parallel. After completing instruction parsing, the edge computing device conducts millisecond-level full-state pre-inspection of all execution devices corresponding to the instruction packet through the underground communication system. The pre-inspection covers three core dimensions: communication link, device online status, and device performance status. Among them, the normal standard for communication link is signal strength ≥ -85dBm and packet loss rate ≤ 1%; the normal standard for device online status is heartbeat packet interval ≤ 100ms; for moving targets, there must be no fault codes in the braking, steering, and power systems and the execution accuracy must be ≥ 95%; for smart wearable devices, the battery level must be ≥ 20% and the sound, light, and vibration modules must be fault-free. The instruction packet is only sent to the execution device when all pre-inspection dimensions are normal. If any abnormality is found, a temporary emergency plan will be triggered immediately. The pre-inspection results are fed back to the execution monitoring panel of the edge computing device in real time and logged, providing data basis for subsequent anomaly tracing.
[0123] Precise control of underground moving targets is achieved through a mining industrial-grade vehicle-to-everything (V2X) system. This system adopts a converged architecture of "wired backbone + wireless branch," adapting to the communication environment of underground roadways with numerous obstructions and strong metal interference, ensuring that control commands are issued accurately without delay or packet loss. The wired backbone is a mining explosion-proof industrial Ethernet deployed in core areas such as main roads and intersections in underground roadways, with a bandwidth of ≥100Mbps and a latency of ≤5ms, providing communication assurance for high-speed and high-precision control of moving targets. The wireless branch is a mining intrinsically safe LoRa spread spectrum communication deployed in areas not covered by wired connections, such as working faces and temporary construction areas, operating at a frequency of 433MHz, with a communication distance of ≥500m and strong penetration capabilities, providing full-area coverage for the mobility control of moving targets. All underground moving targets are equipped with mining intrinsically safe vehicle-mounted communication terminals, supporting wired + wireless dual-mode switching, and possessing explosion-proof, waterproof, and vibration-resistant characteristics. The automatic reconnection time after communication interruption with the V2X system is ≤10ms. Edge computing devices distribute instructions in a tiered manner based on the execution priority of the instruction packets. After receiving the instructions, the vehicle-mounted automatic control system combines the current operating status of the moving target and the equipment performance parameters to achieve millisecond-level precise control: deceleration is divided into regular deceleration and emergency deceleration. Regular deceleration is executed linearly according to the analyzed deceleration ratio, with a response time ≤50ms and an execution accuracy of ±5%, avoiding sudden deceleration that could cause equipment or goods to fall or personnel to be jolted. Emergency deceleration is executed with maximum braking power, with an acceleration ≤2m / s², and simultaneously triggers the vehicle-mounted audible and visual alarm. Steering is executed precisely according to the analyzed angle, with an execution accuracy of ±1° and a steering speed ≤5° / s. During the steering process, the steering angle is corrected in real time based on the macroscopic driving path in step S3 to ensure that the vehicle continues to travel along the macroscopic path after the steering. Parking is divided into regular parking and emergency parking. Regular parking achieves smooth braking, with a braking distance ≤3m when the speed is ≤5m / s. Emergency parking triggers the emergency braking system of the mining equipment, with a braking distance ≤1.5m when the speed is ≤5m / s, and simultaneously cuts off the power system of the moving target. After emergency parking, the vehicle-mounted audible and visual alarm continues to be triggered until a release command is received. During the execution of control actions, the vehicle-mounted sensor group of the moving target collects operational status data such as current speed, real-time three-dimensional position, actual steering angle, and braking pressure at a frequency of 100Hz, and transmits it back to the edge computing device in real time through the vehicle network, providing accurate data support for subsequent execution status feedback and trajectory re-prediction.
[0124] The personnel-side early warning execution is based on intrinsically safe smart wearable devices for personalized triggering. Underground, a dual-mode wireless communication system using Bluetooth 5.0 and LoRa is employed to achieve full coverage and seamless distribution of early warning commands. Bluetooth 5.0 is deployed at the work surface and in densely populated areas, with a communication distance ≥30m, low power consumption, and low latency ≤5ms. LoRa is shared with vehicle-to-everything (V2X) LoRa base stations, achieving full coverage underground. Both smart helmets and smart bracelets are equipped with intrinsically safe dual-mode communication modules for mining, supporting automatic switching between Bluetooth and LoRa, with a response time ≤10ms for receiving early warning commands. When the edge computing device sends out early warning command packets, it simultaneously transmits the personnel attention status information output in step S1. The alert is pushed to the smart wearable device, and the device's alert control module dynamically adjusts the intensity, frequency, and combination of the alert based on the state, achieving targeted and personalized alerts: for focused states (… , A single warning of conventional intensity is used for distracted states ( , ) Enhance the intensity of early warnings and supplement another type of early warning, for fatigue state ( , any Forced triggering of the highest intensity sound, light, and vibration dual warning, for distracted and fatigued states ( , The system triggers a dual warning system at the highest level, with all warning parameters calibrated and optimized for underground scenarios. The warning modules of the smart wearable devices all meet intrinsically safe standards for mining. The sound and light module of the smart safety helmet is deployed at the front of the brim, using penetrating red light and an interference-resistant buzzer sound. The vibration module of the smart bracelet is deployed inside the bracelet, using an eccentric motor to ensure clear perception by personnel, and the trigger response time for the warning action is ≤20ms. Simultaneously, the smart bracelet is equipped with an intrinsically safe confirmation button for mining. After personnel perceive the warning and press the button, the wearable device sends a "warning perceived" feedback signal to the edge computing device. If no human feedback is received within 5 seconds, the edge computing device will further increase the warning intensity and mark the personnel as "high-risk unresponsive personnel," simultaneously notifying the on-site safety officer.
[0125] Real-time feedback of execution status and trajectory re-prediction are the core components of this method to achieve dynamic closed-loop operation. Edge computing devices, through a data fusion engine, fuse multi-source data including vehicle status data of the moving target, feedback data from personnel's wearable devices, and visual recognition data from underground video streams to obtain global data on the actual execution status of the personnel and vehicle. This data is then compared with the expected parameters of the decision command and the original predicted trajectory from step S2 to calculate the execution deviation. This step specifies clear deviation thresholds for the underground scenario. The thresholds for the difference between the actual deceleration ratio and the expected turning angle of the moving target are ±10% and ±3°, respectively. The position deviation threshold between the actual position and the original predicted trajectory is ±0.5m. During personnel warning execution, failure to trigger the warning as instructed or a manual feedback time exceeding 5 seconds constitutes a deviation. In the relative motion deviation between personnel and vehicles, the threshold for the difference between the actual distance between personnel and vehicles and the original predicted distance is ±0.3m. Any deviation exceeding the threshold is considered "excessive deviation." When the deviation is deemed too large, the edge computing device immediately invokes the spatiotemporal trajectory prediction model from step S2, using the current actual execution state of the person and vehicle as input, to re-predict the trajectory for the next 3-5 seconds. The model inference latency is ≤50ms, perfectly matching the 100ms decision iteration cycle of step S4. Based on the new trajectory obtained from the re-prediction, the actual collision risk value between the person and vehicle is recalculated according to the formula Risk Value = 1 - Actual Distance / Safe Distance. A risk value ≥0.5 is set as high risk. The re-predicted trajectory and risk value are synchronized in real time to the decision iteration stage of step S4, providing updated state data for the Q-value calculation and optimal action combination selection in the next cycle, ensuring that the decision can dynamically adapt to the execution deviation. If all deviation indicators are within the threshold range, the edge computing device will continue to monitor the execution state of the person and vehicle until the collision risk is eliminated, and then issue an "Execution Termination" command. The moving target and the personnel's wearable devices immediately stop control and warning actions and resume normal operation.
[0126] It should also be noted that, to address potential anomalies in the complex underground environment, such as communication interruptions, equipment malfunctions, personnel unresponsiveness, and sudden new risks, a tiered emergency response mechanism for anomalies can be established to ensure uninterrupted and safe evacuation operations under abnormal circumstances. Edge computing devices use multi-dimensional anomaly detection indicators, including communication link packet loss rate, equipment fault codes, personnel feedback timeouts, and sudden risk identification results, to identify anomalies in real time during execution. Anomalies are categorized into three emergency levels—Level 1, Level 2, and Level 3—based on their impact. Level 1 represents minor anomalies that do not affect core evacuation operations and can recover on their own; Level 2 represents moderate anomalies that affect some evacuation operations and require auxiliary execution; and Level 3 represents severe anomalies that directly cause evacuation operations to fail and require emergency response. For different levels of abnormal scenarios, corresponding customized emergency plans are configured: Level 1 anomalies include brief packet loss in the communication link (packet loss rate 1%~5%), low battery of wearable devices (10%~20%), and slight timeout in personnel feedback (5~10 seconds), corresponding measures include command retransmission, low battery reminders and reduction of warning power consumption, and enhancement of the level 1 warning intensity; Level 2 anomalies include single device communication interruption (≤10s), slight decrease in the accuracy of moving target control (execution deviation 10%~20%), and failure of a single warning module in the wearable device, corresponding measures include switching to a backup communication link and using local broadband in the mine. The system employs several measures: broadcasting auxiliary commands, increasing the margin of control command execution to compensate for accuracy loss, and automatically switching to another type of early warning module. Level 3 anomalies are the highest priority, including multi-device communication interruptions (>10s), moving target braking / steering system failures, personnel unresponsiveness (>10 seconds), and sudden new risks such as third-party intrusion. Corresponding measures include activating the underground mine broadcasting system and triggering an emergency alarm on the on-site safety officer's handheld terminal; triggering coordinated avoidance of adjacent moving targets; dispatching safety officers to the scene; and triggering global emergency avoidance and re-executing the prediction and decision-making process of steps S2-S4. After the anomaly scenario is handled, the edge computing device will monitor the anomaly resolution status in real time. Once the anomaly is confirmed, it will immediately issue an "execution recovery" command to restore the moving target and personnel's wearable devices to the normal avoidance execution state. Based on the actual state of the personnel and vehicles after the anomaly handling, trajectory re-prediction and decision optimization will be performed again to ensure the continued effective execution of the avoidance operation.
[0127] Once the risk of collision between personnel and vehicles is eliminated and the moving targets and personnel return to normal working conditions, the execution results of this hazard avoidance operation will be archived and the entire process reviewed, providing data support for mine safety management and continuous model optimization. Edge computing devices will systematically organize the entire hazard avoidance process data according to the flow of perception-prediction-modeling-decision-execution-feedback, encapsulating it into a standardized hazard avoidance file. This file will be uploaded to the mine's surface safety management platform via underground industrial Ethernet for permanent storage. The archived data includes perception data such as personnel and vehicle positions, personnel attention status, and roadway topology constraints in step S1; prediction data such as the original predicted trajectory, trajectory re-prediction data, and initial collision risk assessment results in step S2; modeling data such as the state space, action space, and real-time parameters of the reward function in step S3; decision data such as the single-action Q-value, optimal action combination, decision instruction encoding, and parsing results in step S4; and execution and anomaly data such as equipment execution parameters, execution status feedback data, deviation calculation results, and abnormal scene identification and processing results in step S5. This file supports full-process query and traceability. The mine safety management platform can use a data analysis engine to quantitatively review hazard avoidance records, calculating core performance indicators such as total response time, execution accuracy, risk mitigation time, and anomaly incidence rate. Based on the review results, it can perform offline optimization of the LSTM+Transformer hybrid trajectory prediction model, deep reinforcement learning decision model, and Q-value calculation rules. The optimized parameters will be synchronized to the underground edge computing devices during mine system updates, enabling continuous iterative upgrades of the model. Simultaneously, the review results of the hazard avoidance records will be synchronized to the mine safety management system, providing data support for optimizing work procedures in high-risk underground areas, maintaining and repairing frequently malfunctioning equipment, and providing safety training and work scheduling for personnel frequently experiencing distraction or fatigue. This reduces human, equipment, and environmental risks associated with personnel and vehicle collisions underground from the source.
[0128] According to an embodiment of the present invention, the mine safety monitoring method based on an AI-based identification and monitoring system constructs a state space using a deep reinforcement learning decision model, which integrates predicted motion trajectories, attention state information, and roadway topological constraints. It also defines an action space encompassing vehicle control and personnel early warning, along with a reward function that combines safety and efficiency, achieving a paradigm shift from passive alarm to proactive decision-making regarding underground personnel-vehicle collision risks. Through collaborative decision-making between an upper-level path planner and a lower-level action executor in a hierarchical reinforcement learning architecture, it achieves an organic combination of global path optimization and micro-dynamic risk avoidance. Furthermore, by using a policy network to calculate action Q-values in real time and select the optimal action combination, combined with a closed-loop mechanism of execution state feedback and trajectory re-prediction, it achieves adaptive decision optimization for dynamically changing underground environments. This invention enables the mine safety monitoring system to proactively output graded control commands and personalized early warning commands before personnel-vehicle collisions occur, improving the inherent safety level of underground personnel-vehicle collaborative operations.
[0129] Although embodiments of the present invention have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present invention. Those skilled in the art can make changes, modifications, substitutions and variations to the above embodiments within the scope of the present invention.
Claims
1. A mine safety monitoring method based on an AI-based recognition and monitoring system, characterized in that, The AI-based identification and monitoring system includes image acquisition equipment, edge computing equipment, and smart wearable devices deployed in the mine. The method includes the following steps: S1. The video stream of the tracked object in the well is acquired in real time through the image acquisition device; the edge computing device performs image recognition processing on the video stream to obtain the location information of each tracked object and the attention state information of the personnel in the tracked object, wherein the tracked object includes moving targets and personnel; S2. The edge computing device performs temporal image analysis on the video stream and extracts the motion feature sequence of each tracked object; the motion feature sequence and the attention state information are input into a pre-trained spatiotemporal trajectory prediction model to obtain the predicted motion trajectory of each tracked object in a future preset time period; S3. The edge computing device constructs a deep reinforcement learning decision model, which includes a state space, an action space, and a reward function. The state space is composed of the predicted motion trajectory, the attention state information, and the topological constraints of the underground roadway identified from the video stream. The action space consists of deceleration commands, steering commands, and stopping commands for moving targets, as well as audio-visual warning commands and vibration warning commands for personnel. The reward function includes a collision penalty term, a safe distance maintenance reward term, and a traffic efficiency penalty term. S4. The edge computing device inputs the state space of all tracked objects at the current moment into the deep reinforcement learning decision model. The deep reinforcement learning decision model calculates the Q value of each action according to the preset policy network and selects the action combination that maximizes the Q value as the intervention decision at the current moment. S5. Based on the intervention decision, send corresponding control commands to the moving target through the vehicle network, and at the same time send corresponding warning commands to the personnel through the smart wearable device to perform active risk avoidance operations.
2. The mine safety monitoring method based on an AI-based recognition and monitoring system according to claim 1, characterized in that, In S1, the edge computing device performs image recognition processing on the video stream to obtain the location information of each tracked object, including: acquiring real-time video streams through an explosion-proof binocular camera in the image acquisition device, and simultaneously receiving ranging signals sent by UWB tags carried by the tracked objects through an ultra-wideband UWB base station; employing a tightly coupled fusion algorithm, injecting the ranging signals as constraints into the graph optimization process of visual simultaneous localization and mapping (SLAM), and using video frame feature point matching results to assist in identifying and eliminating UWB multipath errors, outputting the centimeter-level three-dimensional coordinates and heading angle of the tracked object as the location information.
3. The mine safety monitoring method based on an AI-based recognition and monitoring system according to claim 1, characterized in that, In S1, the edge computing device performs image recognition processing on the video stream to obtain attention state information, including: acquiring eye images of personnel through a high frame rate eye-tracking camera in the image acquisition device; performing eye key point detection and head posture estimation on the eye images to extract the gaze direction vector; transforming the gaze direction vector from the camera coordinate system to the three-dimensional space coordinate system of the tunnel, and calculating the gaze point coordinates of the personnel in the tunnel space by combining the three-dimensional coordinates in the location information; and generating the personnel's attention state information based on the positional relationship between the gaze point coordinates and the preset hazard source area, wherein the attention state information includes a focused state and a distracted state.
4. The mine safety monitoring method based on an AI-based recognition and monitoring system according to claim 1, characterized in that, The spatiotemporal trajectory prediction model in S2 is a hybrid model combining the Long Short-Term Memory Network (LSTM) and the Transformer. The hybrid model takes the motion feature sequence and the attention state information as input, extracts temporal dependency features through LSTM, captures long-distance dependencies through Transformer, and outputs the predicted motion trajectory of each tracked object in the next 3 to 5 seconds.
5. The mine safety monitoring method based on an AI-based recognition and monitoring system according to claim 1, characterized in that, The deep reinforcement learning decision model built by the edge computing device in S3 adopts a hierarchical reinforcement learning architecture, including an upper-layer path planner and a lower-layer action executor. The upper-layer path planner uses a roadway topology map constructed from the topological constraints of the underground roadway identified from the video stream as its cognitive basis, and aims to minimize global travel time and avoid collisions to plan a macroscopic driving path for each moving target. The lower-layer action executor receives the macroscopic driving path output by the upper-layer path planner and, combined with the predicted motion trajectory and attention state information of the surrounding tracked objects in real time, outputs specific deceleration commands, steering commands, or stopping commands.
6. The mine safety monitoring method based on an AI-based recognition and monitoring system according to claim 1, characterized in that, The warning instructions sent to the person through the smart wearable device in S5 include: sending personalized warning signals through the vibration module and sound and light alarm built into the smart bracelet or safety helmet. The intensity and mode of the warning signal are dynamically adjusted according to the person's attention state information.
7. The mine safety monitoring method based on an AI-based recognition and monitoring system according to claim 1, characterized in that, In S5, the control commands sent to the moving target via the vehicle network include: sending deceleration commands, steering commands, or stopping commands to the vehicle's automatic braking system or control system; wherein, when the collision risk exceeds a preset threshold, the stopping command is executed in emergency stopping mode, and the priority and parameters of the command are calculated and determined in real time based on the collision risk level and the attention status information of the surrounding people.
8. The mine safety monitoring method based on an AI-based recognition and monitoring system according to claim 2, characterized in that, The tightly coupled fusion algorithm further includes: when insufficient feature points in the video frame or jitter in the UWB signal is detected, fusing the inertial measurement unit (IMU) data through Kalman filtering to output continuous and smooth position information.
9. The mine safety monitoring method based on an AI-based recognition and monitoring system according to claim 3, characterized in that, The generation of attention state information also includes: combining the person's head posture and gaze duration to determine whether they are in a fatigued state, and using fatigue state as an additional dimension of attention state information.
10. The mine safety monitoring method based on an AI-based recognition and monitoring system according to claim 4, characterized in that, The training of the spatiotemporal trajectory prediction model further includes: constructing a generative adversarial network (GAN), wherein the generator of the GAN generates a motion feature sequence corresponding to a virtual human-vehicle dangerous interaction trajectory based on real motion feature sequences and attention state information; the virtual motion feature sequence is mixed with the real motion feature sequence to enhance the training of the spatiotemporal trajectory prediction model, so that the spatiotemporal trajectory prediction model learns the motion patterns of people and moving targets in extreme scenarios.