A method for sensing a state of a person based on a mobile robot and related devices
By using multimodal perception data and risk assessment models, mobile robots can quickly identify abnormal states of personnel in the warehouse environment and generate control commands, solving the problems of untimely or excessive response in existing technologies and improving safety and transportation efficiency.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- SHENZHEN CHUANGMENGLONG TECHNOLOGY CO LTD
- Filing Date
- 2026-01-28
- Publication Date
- 2026-06-16
AI Technical Summary
Existing mobile robots cannot effectively distinguish the diverse states of people in warehousing environments, resulting in untimely or excessive responses, safety hazards, and low transportation efficiency.
Multimodal sensing data is used to acquire the structured feature set and dynamic motion information of personnel. A risk assessment model is used to quickly determine the abnormal state of personnel and generate control commands to execute early warning response operations.
It improved the accuracy of personnel status assessment, shortened risk response time, reduced the severity of safety accidents, and improved the transportation efficiency of mobile robots.
Smart Images

Figure CN122223637A_ABST
Abstract
Description
Technical Field
[0001] This application relates to the field of robot navigation technology, and in particular to a method and related equipment for perceiving the state of personnel based on a mobile robot. Background Technology
[0002] Currently, mobile robots are widely used in industrial scenarios such as unmanned warehousing for tasks such as material handling, sorting, and inventory. With the development of human-robot collaborative operation modes, dynamic interaction between robots and operators in the same scenario has become the norm, thus placing higher demands on the safety management of warehouse sites. Most existing warehouse robots mainly rely on traditional sensors such as LiDAR and ultrasonic sensors to achieve static obstacle avoidance and basic collision avoidance. Their perception dimensions are relatively limited, lacking an understanding of the status of personnel in the environment.
[0003] In existing technologies, mobile robots typically treat people in a moving environment as dynamic obstacles, and can only perform simple trajectory prediction and reactive obstacle avoidance based on basic kinematic information such as their position and speed. However, in actual complex warehousing operations, people may be in various states such as falling, crouching, standing still, or suddenly running, and their behavioral intentions and potential risks far exceed the scope of conventional obstacles. Relying solely on motion information cannot effectively distinguish these critical states, which may lead to untimely or excessive responses from mobile robots, posing significant safety hazards and failing to effectively improve transportation efficiency. Summary of the Invention
[0004] To address the aforementioned technical problems, this application provides a method and related equipment for perceiving the state of personnel based on a mobile robot.
[0005] The technical solution provided in this application is described below:
[0006] The first aspect of this application provides a method for perceiving the state of personnel based on a mobile robot, the method comprising: Acquire multimodal perception data of the environment surrounding the mobile robot; Based on the multimodal perception data, at least one human target in the environment surrounding the mobile robot is continuously tracked, and the dynamic motion information of the human target is obtained. Determine the perception trigger conditions; When it is determined that the target person among the at least one personnel target meets the perception triggering condition, at least one frame of image to be analyzed is acquired that is associated with the target person; The at least one frame of the image to be analyzed is analyzed to extract a structured feature set of the target person, the structured feature set including structured feature information reflecting the state of the target person; The risk assessment result corresponding to the target personnel is determined based on the structured feature set and the dynamic motion information; Control instructions are generated based on the risk assessment results; The mobile robot is controlled to perform an early warning response operation according to the control instructions.
[0007] Optionally, the at least one frame of the image to be analyzed is analyzed to extract a structured feature set of the target person, including: Person target detection and key point localization are performed on at least one frame of the image to be analyzed to obtain at least one set of human body key point coordinate sequence of the target person, wherein the human body key points include the positions of the main joints and contour feature points of the human body; Human body parts are analyzed and posture is predicted based on the human body key point coordinate sequence to generate a posture skeleton model of the target person. The posture feature vector of the target person is calculated based on the posture skeleton model. The posture feature vector is used to describe the body orientation, limb extension and overall posture category of the target person. Based on the at least one frame of the image to be analyzed and the sequence of human body key point coordinates, the target area where the target person is located is cropped and aligned to obtain the facial area image and upper body area image of the target person. Facial feature analysis is performed on the facial region image to extract facial feature vectors, which include at least one of expression classification information, gaze direction prediction, and head posture angle. The upper body region image is subjected to appearance attribute recognition to extract appearance attribute feature vectors, which include at least one of clothing color, texture, clothing category, and whether glasses, hats, or masks are worn. The at least one frame of the image to be analyzed is subjected to associated target detection to identify the items held or accompanied by the target person, and the associated item feature vector is extracted. The associated item feature vector includes the item's category, size, and relative positional relationship with the person. The posture feature vector, the facial feature vector, the appearance attribute feature vector, and the associated item feature vector are fused and encoded to form a structured feature set of the target person.
[0008] Optionally, when it is determined that a target person among the at least one personnel target meets the perception trigger condition, at least one frame of image to be analyzed associated with the target person is acquired, including: When it is determined that the target person meets the perception triggering condition, the target person is locked from the current environment scene as the individual to be perceived based on the multimodal perception data; Obtain the real-time spatial coordinates of the target person in the robot coordinate system; Control parameters are generated based on the real-time spatial coordinates. The control parameters are used to control at least one image sensor of the mobile robot so that the center of the observation field of the image sensor covers the target person. The image sensor is controlled according to the control parameters to acquire at least one frame of the target person as the image to be analyzed.
[0009] Optionally, determining the risk assessment result corresponding to the target person based on the structured feature set and the dynamic motion information includes: The structured feature set and dynamic motion information of the target person are correlated and fused to generate descriptive information of the target person's current behavior pattern; Based on a preset risk assessment model, the descriptive information is compared and analyzed with multiple predefined risk patterns; Based on the analysis results, a quantitative risk assessment result for the target personnel is output, which includes risk level identification and judgment elements.
[0010] Optionally, the system continuously tracks at least one human target in the environment surrounding the mobile robot based on the multimodal perception data and acquires the dynamic motion information of the human target, including: Based on the multimodal perception data, at least one potential human target in the current environmental frame is identified using a target detection algorithm; Acquire initial observation data for at least one potential human target, the initial observation data including location and geometric attribute data; The motion state of the personnel target is updated in real time by fusing the initial observation data using a time-series filtering algorithm. The dynamic motion information is calculated and output based on the updated motion state.
[0011] Optionally, control instructions are generated based on the risk assessment results, including: Based on the risk assessment results, a preset response strategy library is matched and a basic response strategy is determined. The basic response strategy is parameterized based on the real-time dynamic motion information of the target person and the state of the mobile robot to generate at least one preliminary control command. The timing of the at least one preliminary control command is arranged to generate a control command sequence.
[0012] Optionally, the mobile robot is controlled to perform an early warning response operation according to the control command, including: The control commands are parsed to obtain the command type and associated parameters. The command types include communication warning, navigation motion, and multimodal interaction. The associated parameters include target location, warning level, interaction content, and target personnel identification information. The mobile robot's response behavior sequence is planned according to the instruction type and associated parameters, and the mobile robot is driven to perform the response operation.
[0013] A second aspect of this application provides a personnel state perception device based on a mobile robot, the device comprising: The first acquisition unit is used to acquire multimodal perception data of the environment surrounding the mobile robot; The second acquisition unit is used to continuously track at least one human target in the environment surrounding the mobile robot based on the multimodal perception data, and acquire the dynamic motion information of the human target. The first determining unit determines the perception trigger conditions; The third acquisition unit is used to acquire at least one frame of image to be analyzed associated with the target person when it is determined that the target person in the at least one person target meets the perception triggering condition; An extraction unit is used to analyze the at least one frame of the image to be analyzed in order to extract a structured feature set of the target person, the structured feature set including structured feature information reflecting the state of the target person; The second determining unit is used to determine the risk assessment result corresponding to the target person based on the structured feature set and the dynamic motion information; A generation unit is used to generate control instructions based on the risk assessment results; An execution unit is used to control the mobile robot to perform early warning response operations according to the control instructions.
[0014] A third aspect of this application provides a personnel state perception device based on a mobile robot, the device comprising: Processor, memory, input / output units, and bus; The processor is connected to the memory, the input / output unit, and the bus; The memory stores a program, which the processor invokes to perform the method as described in the first aspect and any one of the first aspects.
[0015] A fourth aspect of this application provides a computer-readable storage medium on which a program is stored, which, when executed on a computer, performs the method as described in the first aspect and any one of the first aspects.
[0016] As can be seen from the above technical solutions, this application has the following beneficial effects: 1. This application uses multimodal perception data as a basis, which can more comprehensively capture the characteristics of people and their surrounding environment. Combined with the dynamic motion information of the target personnel obtained through continuous tracking and the structured feature set, it effectively realizes the dual analysis of dynamic behavior and static features, improves the accuracy of personnel status judgment, and effectively avoids misjudgment caused by single perception bias.
[0017] 2. Based on structured feature sets and dynamic motion information, this application can quickly determine the abnormal state of target personnel and generate control commands to execute early warning response operations, effectively shortening risk response time, reducing the severity of personnel safety accidents, and effectively improving the transportation efficiency of mobile robots. Attached Figure Description
[0018] To more clearly illustrate the technical solutions in this application, the accompanying drawings used in the description of the embodiments will be briefly introduced below. Obviously, the accompanying drawings described below are only some embodiments of this application. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.
[0019] Figure 1 This is a schematic diagram of an embodiment of the personnel state perception method based on mobile robots in this application; Figure 2 This is a schematic diagram of another embodiment of the personnel state perception method based on mobile robots in this application; Figure 3 This is a schematic diagram of another embodiment of the personnel state perception method based on mobile robots in this application; Figure 4 This is a schematic diagram of another embodiment of the personnel state perception method based on mobile robots in this application; Figure 5 This is a schematic diagram of another embodiment of the personnel state perception method based on mobile robots in this application; Figure 6 This is a schematic diagram of another embodiment of the digital twin-based personnel intrusion control method for industrial hazardous areas according to this application; Figure 7 This is a schematic diagram of one embodiment of the personnel status perception device based on a mobile robot according to this application; Figure 8 This is a schematic diagram of another embodiment of the personnel status perception device based on a mobile robot according to this application; Figure 9 This is a schematic diagram illustrating the application scenario of the personnel status perception device based on a mobile robot, as described in this application. Detailed Implementation
[0020] It should be noted that the execution subject of the personnel status perception method based on mobile robots described in this embodiment is not limited to a single one and can be flexibly configured according to actual application scenarios and functional requirements. Specifically, all or part of the steps of the method can be executed independently by the mobile robot's own controller. The controller can be a hardware unit with data processing and instruction generation capabilities, such as a microprocessor, embedded control system, FPGA, or MCU built into the mobile robot, and implemented with corresponding software algorithms. Alternatively, it can be executed by an external device that establishes a communication connection with the mobile robot, such as a remote server, edge computing node, or industrial control computer. The external device receives multimodal perception data and dynamic motion information transmitted by the mobile robot, completes personnel tracking, feature extraction, risk assessment, and control instruction generation, and then sends the instructions to the mobile robot to drive it to perform early warning response operations. Alternatively, a collaborative execution method between the mobile robot and the external device can be adopted, that is, the two parties cooperate to complete the method process. For example, the mobile robot is responsible for multimodal data acquisition, initial personnel tracking, and early warning response operations, while the external device is responsible for complex image analysis, feature extraction, and risk assessment calculations, in order to balance the computing load of the device and improve perception and response efficiency.
[0021] Regardless of the execution entity configuration used, as long as the core steps of this method, such as multimodal data acquisition, personnel tracking, perception triggering, image analysis, feature extraction, risk assessment, instruction generation, and robot control, can be achieved, they should fall within the protection scope of this invention. The following embodiments, for ease of understanding, use the mobile robot's built-in controller as the main execution entity for detailed description; however, this description does not constitute a limitation on the execution entity of this method.
[0022] In existing technologies, mobile robots typically treat people in a moving environment as dynamic obstacles, and can only perform simple trajectory prediction and reactive obstacle avoidance based on basic kinematic information such as their position and speed. However, in actual complex warehousing operations, people may be in various states such as falling, crouching, standing still, or suddenly running, and their behavioral intentions and potential risks far exceed the scope of conventional obstacles. Relying solely on motion information cannot effectively distinguish these critical states, which may lead to untimely or excessive responses from mobile robots, posing significant safety hazards and failing to effectively improve transportation efficiency.
[0023] Based on this, this application provides a method and related equipment for perceiving the state of personnel based on mobile robots. Based on structured feature sets and dynamic motion information, it can quickly determine the abnormal state of the target personnel and generate control commands to execute early warning response operations, effectively shortening the risk response time, reducing the severity of personnel safety accidents, and effectively improving the transportation efficiency of mobile robots.
[0024] Please see Figure 1This application discloses a method for perceiving the state of people based on a mobile robot, the method comprising: 101. Acquire multimodal perception data of the environment surrounding the mobile robot; 102. Based on the multimodal perception data, continuously track at least one human target in the environment surrounding the mobile robot and acquire the dynamic motion information of the human target; 103. Determine the perception trigger conditions; 104. When it is determined that the target person among the at least one personnel target meets the perception triggering condition, acquire at least one frame of image to be analyzed associated with the target person; 105. Analyze the at least one frame of the image to be analyzed to extract the structured feature set of the target person, the structured feature set including structured feature information reflecting the state of the target person; 106. Determine the risk assessment result corresponding to the target personnel based on the structured feature set and the dynamic motion information; 107. Generate control instructions based on the risk assessment results; 108. Control the mobile robot to perform early warning response operations according to the control instructions.
[0025] In this embodiment, multimodal perception data of the environment surrounding the mobile robot is first acquired. Then, at least one human target in the environment surrounding the mobile robot is continuously tracked based on the multimodal perception data, and dynamic motion information of the human target is acquired. Next, a perception trigger condition is determined. When it is determined that the target human among at least one human target meets the perception trigger condition, at least one frame of image to be analyzed associated with the target human is acquired, and the at least one frame of image to be analyzed is analyzed to extract a structured feature set of the target human. The structured feature set includes structured feature information that reflects the state of the target human. Then, the risk assessment result corresponding to the target human is determined based on the structured feature set and dynamic motion information. After obtaining the assessment result, a control command is generated based on the risk assessment result. Finally, the mobile robot is controlled to perform a warning response operation based on the control command.
[0026] In step 101, multimodal perception data of the surrounding environment of the mobile robot is first acquired. It should be noted that the mobile robot is pre-equipped with multimodal perception devices such as LiDAR, depth camera, infrared thermal imaging sensor, and microphone. These devices establish data communication connections with the robot's main control module. When the robot is in working state, each perception device synchronously collects surrounding environmental data according to a preset sampling frequency. Specifically, LiDAR is used to acquire 3D point cloud data of obstacles and human targets in the environment, depth camera is used to acquire color image and depth image data, infrared thermal imaging sensor is used to capture thermal radiation information of human targets to reflect their body temperature status, and microphone is used to collect sound signals in the surrounding environment, including human voices and sounds generated by actions. The main control module performs preliminary integration of the data collected by each perception device, removes invalid data caused by device noise, and forms a multimodal perception dataset in a unified format.
[0027] In step 102, after acquiring the multimodal perception data, the system further continuously tracks at least one human target in the environment surrounding the mobile robot based on the multimodal perception data, and acquires the dynamic motion information of the human target. Specifically, the multimodal perception dataset is first fused, and a registration algorithm for point cloud data and depth images is used to identify human targets in the environment. For each identified human target, a multi-target tracking algorithm is used to establish a target tracking model by extracting the appearance features, contour features, and position information of the human target. During the robot's movement or the movement of the human target, the dynamic motion information such as the position coordinates, movement speed, and movement direction of each human target is updated in real time. At the same time, the system combines thermal image data collected by the infrared thermal imaging sensor and inter-frame difference data from the depth camera to eliminate tracking interruptions caused by environmental object occlusion, ensuring the continuity of human target tracking. When there are multiple human targets in the surrounding environment, a unique identification identifier is assigned to each human target, and the dynamic motion information of each target is recorded to form a human target motion information database.
[0028] In step 103, during continuous tracking of personnel targets, perception trigger conditions are determined. It should be noted that perception trigger conditions refer to a set of preset conditions used to determine whether a focused state analysis of the personnel target is necessary. This set of conditions is determined by combining the dynamic motion information of the personnel target, environmental scene characteristics, and safety management requirements, specifically including but not limited to abnormal motion state conditions of the personnel target. Examples include: a personnel target's movement speed suddenly exceeding a preset threshold, a sudden change in movement direction towards the robot or a specific hazardous area; abnormal physiological state conditions of the personnel target, such as a body temperature exceeding the normal range detected by an infrared thermal imaging sensor; and abnormal behavioral conditions of the personnel target, such as a personnel target remaining stationary for an extended period or abnormal contact with surrounding objects. These perception trigger conditions are preset by the user according to the actual application scenario, or can be dynamically adjusted by the robot using machine learning algorithms combined with historical data; no specific limitations are made here. The determined perception trigger conditions are stored as criteria for subsequent target personnel screening.
[0029] In step 104, after acquiring the dynamic motion information of the personnel targets and determining the perception trigger conditions, further, when it is determined that at least one target person among the personnel targets meets the perception trigger conditions, at least one frame of image to be analyzed associated with the target person is acquired. Specifically, the dynamic motion information of each personnel target is compared with the preset perception trigger conditions in real time. When the motion state, physiological state, or behavioral state of a certain personnel target is detected to meet any perception trigger condition, the personnel target is marked as the target person. At this time, the main control module sends an image acquisition command to the depth camera, controlling the depth camera to acquire multiple frames of images continuously for the target person, or extracts keyframe images containing the target person from the acquired image sequence. These images to be analyzed must completely contain the full body or local feature areas of the target person, such as the face, limb movement areas, etc. At the same time, combined with the infrared thermal imaging data acquired in the previous steps, the thermal image of the target person is fused with the color image to generate an image to be analyzed containing temperature information, ensuring that the image to be analyzed can fully reflect the appearance and physiological characteristics of the target person.
[0030] In step 105, after determining that at least one frame of the image to be analyzed is associated with the target person, the at least one frame of the image to be analyzed is analyzed to extract the structured feature set of the target person. The structured feature set includes structured feature information that reflects the state of the target person. Specifically, the main control module uses computer vision algorithms to process the acquired image to be analyzed. First, image preprocessing is performed, including image denoising, enhancement, normalization, and other operations to improve image quality. Then, for the preprocessed image, the structured features of the target person are extracted, including facial features, such as facial expressions and facial posture, to determine the person's emotional state; limb features, such as limb movements and posture angles, to determine whether the person is in an abnormal posture such as falling or curling up; and physiological features, such as body temperature distribution data extracted based on infrared thermography, to determine whether the person has abnormal body temperature.
[0031] Simultaneously, by combining the sound signals collected by the microphone, the voice features or action sound features of the target personnel are extracted and incorporated into a structured feature set. The main control module organizes these extracted feature information according to a preset structured format to form a structured feature set that corresponds one-to-one with the target personnel. This feature set can comprehensively and quantitatively reflect the current state of the target personnel and provide feature input for risk assessment.
[0032] In step 106, the risk assessment result corresponding to the target person is determined based on the structured feature set and the dynamic motion information. Specifically, a risk assessment model is pre-built in the main control module. This model is trained based on a deep learning algorithm. The training data includes the structured feature set, dynamic motion information, and corresponding risk level labels of people in different scenarios. During risk assessment, the main control module inputs the structured feature set and dynamic motion information of the target person into the risk assessment model. The model analyzes the feature data to determine the risk level of the target person's state, which is divided into low risk, medium risk, and high risk. For example, when the target person's body temperature is within the normal range, limb movements are normal, and movement is stable, it is judged as low risk; when the target person has a slight abnormal body temperature or stiff limb movements, it is judged as medium risk; when the target person has a severe abnormal body temperature, falls to the ground, or experiences a sudden change in movement speed, it is judged as high risk. Meanwhile, the risk assessment model combines the robot's current location information and environmental information to correct the risk level. For example, when the target person is in a dangerous area and the perception triggering conditions are met, the risk level is appropriately increased, and finally the risk assessment result corresponding to the target person is generated and fed back to the main control module.
[0033] In step 107, after obtaining the risk assessment result, control instructions are generated based on the risk assessment result. The main control module has a preset response strategy library corresponding to different risk assessment results. When the risk assessment result is low risk, the generated control instruction is to maintain the current tracking state and continue to collect multimodal perception data. When the risk assessment result is medium risk, the generated control instruction includes controlling the mobile robot to move slowly towards the target person, and simultaneously activating the voice prompt module to issue a gentle inquiry voice. When the risk assessment result is high risk, the generated control instruction includes controlling the robot to immediately send an alarm signal to the preset monitoring center, and simultaneously controlling the robot to stay within a safe distance to continuously monitor the target person, and activating the sound and light warning device to issue a conspicuous warning light and alarm sound.
[0034] It should be noted that the control commands also include adjustment commands for each sensing device, such as increasing the sampling frequency of the camera and expanding the scanning range of the lidar, to ensure that more comprehensive target personnel status information can be obtained. The main control module encodes the generated control commands according to the communication protocol to form a command format that can be recognized by the robot's actuators.
[0035] Finally, step 108 is executed, controlling the mobile robot to perform early warning response operations according to the control instructions. Specifically, after receiving the control instructions sent by the main control module, the mobile robot's motion control module and actuators execute the corresponding operations according to the instructions. When receiving the instruction to maintain tracking, the robot maintains its current motion trajectory and the working status of the sensing devices, continuously monitoring the target person. When receiving the instruction to move towards the target person and issue a voice inquiry, the motion control module plans the optimal movement path based on the target person's location information, controls the robot to move slowly, and at the same time, the voice module plays a preset inquiry voice, such as "Hello, do you need help?" When receiving alarm and audible / visual warning instructions, the robot's communication module immediately sends the target person's risk level, location information, structured feature set, and other data to the monitoring center. At the same time, the audible / visual warning module is activated, emitting a red flashing light and a high-frequency alarm sound to alert surrounding personnel. During the execution of the early warning response operation, the robot's main control module continuously receives multimodal sensing data, updates the target person's status information in real time, and adjusts the control instructions according to status changes to ensure the effectiveness and timeliness of the early warning response operation.
[0036] Specifically, control commands are parsed to obtain the command type and associated parameters. Command types include communication warning, navigation / motion, and multimodal interaction. Associated parameters include target location, warning level, interaction content, and target personnel identification information. Then, based on the command type and associated parameters, a response behavior sequence for the mobile robot is planned, and the mobile robot is driven to execute the response operation, thereby completing the warning response operation. Please refer to the application scenario diagram of this application. Figure 9 .
[0037] Please refer to Figure 2 According to some embodiments of the present invention, step 105 involves analyzing the at least one frame of the image to be analyzed to extract the structured feature set of the target person, which may specifically include, but is not limited to, the following: 201. Perform human target detection and key point localization on the at least one frame of the image to be analyzed, so as to obtain at least one set of human body key point coordinate sequence of the target human, wherein the human body key points include the positions of the main joints and contour feature points of the human body; 202. Based on the sequence of human body key point coordinates, perform human body part analysis and posture prediction to generate a posture skeleton model of the target person; 203. Calculate and obtain the posture feature vector of the target person based on the posture skeleton model. The posture feature vector is used to describe the body orientation, limb extension degree and overall posture category of the target person. 204. Based on the at least one frame of the image to be analyzed and the sequence of coordinates of the human body key points, perform image cropping and alignment on the target area where the target person is located to obtain the facial area image and the upper body area image of the target person. 205. Perform facial feature analysis on the facial region image to extract a facial feature vector, wherein the facial feature vector includes at least one of expression classification information, gaze direction prediction, and head posture angle; 206. Perform appearance attribute recognition on the upper body region image to extract appearance attribute feature vectors, wherein the appearance attribute feature vectors include at least one of clothing color, texture, clothing category, and whether glasses, hats, or masks are worn. 207. Perform associated target detection on the at least one frame of the image to be analyzed to identify the items held or accompanied by the target person, and extract the associated item feature vector, wherein the associated item feature vector includes the item's category, size, and relative positional relationship with the person; 208. Perform feature fusion encoding on the posture feature vector, the facial feature vector, the appearance attribute feature vector, and the associated item feature vector to form a structured feature set of the target person.
[0038] In this embodiment, at least one frame of the image to be analyzed is analyzed to extract the structured feature set of the target person. Specifically, firstly, person target detection and key point localization are performed on at least one frame of the image to be analyzed to obtain at least one set of human body key point coordinate sequences of the target person. The human body key points include the positions of the main joints and contour feature points of the human body. Specifically, the main control module calls a preset target detection and key point localization algorithm to process the preprocessed image to be analyzed. This algorithm is trained based on a deep learning model and can first accurately select the area where the target person is located in the image, eliminate the influence of the background environment and other interfering objects, and then perform key point localization on the selected target person area. The localized human body key points specifically cover contour feature points such as the center of the eyebrows, corners of the eyes, and corners of the mouth of the head; neck nodes; upper limb joints such as shoulders, elbows, and wrists; lower limb joints such as hips, knees, and ankles; and torso contour feature points, etc. Each key point corresponds to a two-dimensional coordinate value in the image coordinate system. For multiple frames of images to be analyzed, the algorithm extracts the coordinates of key points frame by frame to form a continuous sequence of human body key point coordinates. At the same time, it corrects coordinate deviations through inter-frame matching algorithms to ensure the stability of key point positioning and provide coordinate data support for subsequent pose analysis.
[0039] After obtaining the human body keypoint coordinate sequence, the system further performs human body part analysis and posture prediction based on the keypoint coordinate sequence to generate a posture skeleton model of the target person. The main control module first performs correlation analysis on the human body keypoint coordinate sequence, connecting adjacent keypoints according to the logic of human physiological structure. For example, it connects the neck node to the shoulder keypoint, and the shoulder keypoint to the elbow keypoint, sequentially completing the analysis of the upper limbs, lower limbs, torso, and head, clarifying the human body part corresponding to each keypoint. Subsequently, through a posture prediction algorithm, combined with human kinematic constraints, the preliminary structure formed by the keypoint connections is optimized, correcting keypoint offset problems caused by image occlusion and noise interference, generating a posture skeleton model that accurately reflects the body structure relationships of the target person.
[0040] After obtaining the posture skeleton model, the posture feature vector of the target person is calculated based on the posture skeleton model. The posture feature vector is used to describe the target person's body orientation, limb extension, and overall posture category. The main control module uses a preset feature calculation algorithm to quantitatively analyze the coordinates and connections of key points in the posture skeleton model. Among them, body orientation is calculated by the relative positional relationship between head key points and torso key points, and converted into angle parameters in the image coordinate system. Limb extension is calculated by the distance and angle between each joint point, such as the angle between the upper arm and forearm, the angle between the thigh and the lower leg, and the deviation distance between the limb and the torso, forming quantitative values. The overall posture category is determined by comparing the current posture skeleton model with a preset posture template library to identify whether the target person is in a standing, walking, bending, falling, or curled-up posture, and converting it into the corresponding category code. The quantitative data and codes of the above body orientation, limb extension, and overall posture category are integrated and standardized into a fixed-dimensional posture feature vector to describe the posture state of the target person.
[0041] While obtaining the pose feature vector, the human body key point coordinate sequence obtained in the previous steps is combined with the image cropping and alignment of the target area where the target person is located based on at least one frame of the image to be analyzed and the human body key point coordinate sequence, so as to obtain the facial area image and upper body area image of the target person.
[0042] Specifically, the main control module first determines the boundary range of the facial region based on the coordinate sequence of key points on the head. Taking the center of the eyebrows as the center, and combining the distribution of key points such as the corners of the eyes and mouth, the preset pixel range is expanded as the facial cropping area to avoid the loss of facial features. Then, based on the coordinates of key points such as the neck, shoulders, and chest, the boundary of the upper body region is determined, covering the area from the neck to the waist, to ensure that the key facial features of the upper body are completely included.
[0043] During the cropping process, an image alignment algorithm is used to rotate and scale the cropped facial and upper body images based on head key points. This eliminates image distortion caused by head offset and body tilt, ensuring that the facial and upper body regions in different frames are in a unified coordinate system. Simultaneously, the cropped and aligned images undergo grayscale normalization and noise removal to optimize image quality, preparing them for facial feature analysis and appearance attribute recognition.
[0044] After acquiring the facial region image, facial feature analysis is performed to extract facial feature vectors. These vectors include at least one of the following: expression classification information, gaze direction prediction, and head posture angles. The main control module employs a facial feature analysis algorithm to first refine and extract feature points from the aligned facial region image, accurately locating subtle feature points of organs such as the eyes, eyebrows, nose, and mouth. Based on the positional changes and morphological characteristics of these feature points, they are compared with a pre-set expression template library to identify the target person's expression category, such as calm, joy, sadness, anger, and pain, generating expression classification information. By analyzing the distribution relationship of eye feature points and combining pupil localization technology, the target person's gaze direction is predicted and converted into horizontal and vertical angle parameters. Through the relative positional relationship between key head points and facial feature points, head posture angles, including pitch, yaw, and roll angles, are calculated to quantify the head's tilt and rotation states. Finally, the expression classification encoding, gaze direction angle parameters, and head posture angles are integrated to form a facial feature vector, comprehensively reflecting the target person's facial state information.
[0045] Simultaneously, for the acquired upper body region image, appearance attribute recognition is performed to extract appearance attribute feature vectors. These feature vectors include at least one of the following: clothing color, texture, clothing category, and whether glasses, hats, or masks are worn. The main control module calls the appearance attribute recognition model to analyze the preprocessed upper body region image region by region. A color clustering algorithm extracts the main and secondary colors of the clothing, converting them into corresponding color space parameters. A texture feature extraction algorithm identifies the texture type of the clothing surface, such as solid color, stripes, checks, and floral patterns, generating texture feature parameters. These parameters are then combined with features such as the clothing's outline, cuff style, and collar type, and compared with a pre-set clothing category database to determine the clothing category, such as a T-shirt, shirt, or jacket. Simultaneously, for the head and face region, the system identifies whether the target person is wearing accessories such as glasses, hats, or masks, determining the type and wearing status of these accessories through feature matching. The above clothing-related features and accessory recognition results are quantified and encoded to form appearance attribute feature vectors, enriching the structured feature information of the target person.
[0046] After extracting facial features and appearance attributes, at least one frame of the image to be analyzed is used for associated target detection to identify items held or accompanied by the target person, and to extract associated item feature vectors. These feature vectors include the item's category, size, and relative position to the person. Specifically, the main control module uses an associated target detection algorithm, combined with the target person's regional range, to detect items around the target person and within their limb contact area in the image to be analyzed, excluding background objects unrelated to the target person. By comparing with a pre-set item category library, the specific category of the item is identified, such as mobile phones, water cups, tools, and dangerous goods. Based on the image pixel ratio and depth image data, the actual size range of the item is calculated and converted into a quantified value. By comparing the coordinates of the item with key points on the human body, the relative positional relationship between the item and the target person is determined, such as whether it is held, worn, or placed beside the body, and the specific corresponding body part. The item category code, size parameters, and relative positional relationship data are integrated to generate associated item feature vectors.
[0047] Finally, the posture feature vector, facial feature vector, appearance attribute feature vector, and associated item feature vector are fused and encoded to form a structured feature set of the target person. Specifically, the main control module uses a feature fusion algorithm to first standardize each feature vector, unifying feature data of different dimensions and magnitudes into the same feature space to eliminate interference caused by data differences. Then, through a weighted fusion strategy, combining the importance weights of each feature vector in the personnel status assessment, the four types of feature vectors—posture, face, appearance attribute, and associated item—are fused and calculated. Among them, the posture feature vector and facial feature vector have relatively high weights, focusing on reflecting the dynamic state and emotional and physiological state of the target person. The appearance attribute feature vector and associated item feature vector serve as auxiliary features to supplement information related to the person's identity and the surrounding environment.
[0048] After fusion, the multidimensional feature data is transformed into a structured feature matrix through an encoding algorithm, forming a structured feature set of the target personnel. This feature set can comprehensively and accurately quantify and reflect the target personnel's posture, expression, appearance attributes, and associated items, among other state information. The resulting structured features of the target personnel can provide complete and reliable feature input for risk assessment.
[0049] Please refer to Figure 3 According to some embodiments of the present invention, in step 104, when it is determined that the target person among the at least one person target meets the perception triggering condition, at least one frame of image to be analyzed associated with the target person is acquired. Specifically, this may include, but is not limited to, the following: 301. When it is determined that the target person meets the perception triggering condition, the target person is locked from the current environment scene as the individual to be perceived based on the multimodal perception data; 302. Obtain the real-time spatial coordinates of the target person in the robot coordinate system; 303. Generate control parameters based on the real-time spatial coordinates, the control parameters being used to control at least one image sensor of the mobile robot so that the center of the observation field of view of the image sensor covers the target person; 304. Control the image sensor to acquire at least one frame of the target person as the image to be analyzed according to the control parameters.
[0050] In this embodiment, when the target person meets the perception triggering condition, the target person is identified as the individual to be perceived based on multimodal perception data. Specifically, the main control module of the mobile robot retrieves the acquired multimodal perception data, including 3D point cloud data from the LiDAR, environmental image data from the depth camera, and thermal radiation data from the infrared thermal imaging sensor. Combined with the dynamic motion information of the target person obtained through continuous tracking, the target person is distinguished from other objects and people in the current environment.
[0051] The main control module fits the contour features of the target person using point cloud data, marks the thermal radiation area of the target person using infrared thermal imaging data, and combines the pixel differences of the depth image to eliminate interference items in the background that do not match the characteristics of the target person. For example, it filters out still objects such as tables, chairs, and green plants in the environment, and excludes other people who do not meet the perception trigger conditions. In this way, in the complex current environment, it uniquely identifies the target person who meets the conditions and marks him as the individual to be perceived.
[0052] After locking onto the target individual, the robot obtains the real-time spatial coordinates of the target person in the robot coordinate system. The mobile robot pre-establishes a robot coordinate system with itself as the origin. This coordinate system has the robot's geometric center as the origin, the robot's forward direction as the positive X-axis, the vertical upward direction as the positive Z-axis, and the horizontal direction perpendicular to the X-axis and Z-axis as the positive Y-axis. The main control module combines the point cloud data collected by the LiDAR and the depth image data from the depth camera to calculate the coordinates of the locked target individual.
[0053] Specifically, the lidar can acquire the three-dimensional coordinate information of multiple points on the surface of the target person. The main control module clusters and fits these point cloud data to obtain the center position point of the target person. At the same time, the depth camera calculates the geometric relationship between the pixel points of the target person in the image and the optical center of the camera to obtain the distance and orientation information of the target person relative to the camera. Then, according to the installation position parameters of the camera on the robot, the position information is converted to the robot coordinate system. The main control module fuses and calibrates the coordinate data acquired by the two devices to eliminate the deviation caused by equipment installation errors and measurement noise. Finally, the real-time spatial coordinates of the individual to be perceived in the robot coordinate system are obtained. This coordinate data is updated in real time and can effectively reflect the positional changes of the target person relative to the robot.
[0054] After obtaining the real-time spatial coordinates of the target person, control parameters are generated based on these coordinates. These parameters are used to control at least one image sensor of the mobile robot, ensuring that the center of the image sensor's field of view covers the target person. The main control module first retrieves the parameter information from the image sensor, including the field of view size, current rotation angle, and pitch angle. Then, based on the obtained real-time spatial coordinates of the target person, it calculates the azimuth and pitch angles of the target person relative to the image sensor. Using the current position of the image sensor as a reference, it determines the angle difference that needs to be adjusted.
[0055] For example, when the target person is located 30° to the right of the positive X-axis in the robot coordinate system, the calculation shows that the image sensor needs to be rotated horizontally to the right by 30°. Simultaneously, the corresponding pitch angle adjustment value is calculated based on the target person's height coordinates. The main control module integrates these angle adjustment values, sensor focal length adjustment values, and other data to generate a set of standardized control parameters. These control parameters also include instructions for adjusting the image sensor's sampling frequency. When the target person is in motion, the sampling frequency is appropriately increased to ensure the capture of clear, continuous image frames. These control parameters are encoded according to a preset communication protocol for transmission to the image sensor's drive module.
[0056] After generating the control parameters, the image sensor is controlled to acquire at least one frame of the target person as the image to be analyzed. Specifically, after receiving the control parameters sent by the main control module, the image sensor's drive module immediately executes the corresponding adjustment operations. According to the angle adjustment values in the control parameters, the image sensor is driven to complete the horizontal rotation and pitch angle adjustments until the center of the image sensor's field of view accurately covers the target person. At the same time, according to the focal length adjustment values in the control parameters, the sensor's focal length is adjusted to ensure that the target person presents a clear imaging effect in the field of view.
[0057] Once the image sensor has completed its adjustment and reached a stable state, it begins acquiring image data of the target person according to the sampling frequency set by the control parameters. If the target person is stationary, at least one clear image containing the target person's entire body and key feature areas is acquired. If the target person is moving, multiple frames are acquired continuously to capture dynamic posture changes. The acquired image data is transmitted to the main control module in real time. The main control module performs preliminary format conversion and noise reduction processing on the image data, and finally determines these images as the images to be analyzed.
[0058] Please refer to Figure 4 According to some embodiments of the present invention, in step 106, determining the risk assessment result corresponding to the target person based on the structured feature set and the dynamic motion information may specifically include, but is not limited to, the following: 401. The structured feature set and dynamic motion information of the target person are correlated and fused to generate descriptive information of the target person's current behavior pattern; 402. Based on a preset risk assessment model, the descriptive information is compared and analyzed with multiple predefined risk patterns; 403. Output the quantitative risk assessment results of the target personnel based on the analysis results. The quantitative risk assessment results include risk level identification and judgment elements.
[0059] In this embodiment, the structured feature set and dynamic motion information of the target person are correlated and fused to generate descriptive information about the target person's current behavior pattern. It should be noted that the correlation and fusion operation relies on a data fusion algorithm built into the mobile robot's main control module. This algorithm can establish a mapping relationship between the structured feature set and the dynamic motion information. Specifically, the main control module first extracts the target person's structured feature set, including quantitative data such as facial expressions, body posture, body temperature distribution, and voice features. Simultaneously, it acquires the target person's dynamic motion information, including parameters such as real-time position coordinates, movement speed, movement direction, and movement trajectory curvature.
[0060] After obtaining the above information, a feature association algorithm is used to match information with consistent timestamps between the two types of data. For example, the limb posture characteristics of a target person at a certain moment are associated with the movement speed data at that moment to determine whether the limb posture and movement state match. Then, through multi-dimensional data fusion rules, the scattered feature data and movement data are integrated into logically related behavioral description elements. For example, when it is detected that the target person's limbs are curled up, body temperature is higher than the normal threshold, movement speed is 0, and the person is continuously in a fixed position, the fusion algorithm will associate and integrate these data to generate behavioral pattern description information such as "the target person is stationary in a certain position, limbs are curled up, and body temperature is abnormally high". This description information adopts a standardized text and parameter combination format, including both qualitative behavioral feature descriptions and quantitative parameter indicators, which can comprehensively reflect the current behavioral state of the target person.
[0061] After generating descriptive information about the target individual's current behavioral patterns, the system compares and analyzes this information against multiple predefined risk patterns based on a pre-defined risk assessment model. This model is an intelligent analysis model trained on a large amount of historical human behavior data and contains multiple pre-defined risk patterns validated in real-world scenarios. These pre-defined risk patterns cover typical risk behaviors in different application scenarios. Examples include fall risk patterns in nursing homes, unauthorized entry risk patterns in industrial parks, and sudden illness risk patterns in home care settings. Each pre-defined risk pattern corresponds to specific behavioral characteristic parameter thresholds and descriptive information templates.
[0062] When performing comparative analysis, the main control module first quantifies the current behavioral pattern description information, extracting key parameters such as body temperature, posture angle, duration of stillness, and deviation of movement trajectory. These quantified indicators are then input into the risk assessment model. The model uses a similarity calculation algorithm to compare the key parameters of the current behavioral pattern with the feature parameter thresholds of each predefined risk pattern. Simultaneously, semantic matching analysis is performed on the qualitative description information. For example, the current behavioral description of "limbs curled up, stillness, body temperature 39.5℃" is compared with the predefined features of "abnormal limb posture, prolonged stillness, body temperature exceeding 38.5℃" in the "sudden illness risk pattern," calculating the matching degree between the two to ensure the identification of the risk type corresponding to the current behavioral pattern.
[0063] After obtaining the comparison and analysis results, the quantitative risk assessment results of the target personnel are output based on the analysis results. The quantitative risk assessment results include risk level identification and judgment elements. The main control module first determines the matching level between the current behavior pattern and the predefined risk pattern based on the matching degree calculation results. When the matching degree is higher than the preset high threshold, it is judged as a high match; when the matching degree is in the middle threshold range, it is judged as a medium match; and when the matching degree is lower than the low threshold, it is judged as a low match.
[0064] Then, combining the matching level and the risk weights of predefined risk patterns, a risk level identifier is generated. This identifier uses a numerical or character-based grading system, for example, divided into levels 1-5, where level 1 represents no risk, level 2 represents low risk, level 3 represents medium risk, level 4 represents high risk, and level 5 represents extremely high risk. Simultaneously, the main control module extracts the core judgment elements for this risk assessment. These elements include key characteristic parameters and behavioral descriptions that trigger the risk. For example, "Risk level 4 (high risk), judgment elements: 1. Target person's body temperature is 39.5℃, exceeding the normal threshold by 1.5℃; 2. Limbs exhibit an abnormal curled-up posture, remaining still for 10 minutes; 3. The behavioral pattern matches the predefined sudden illness risk pattern by 92%." These judgment elements explain the basis for the risk level determination, facilitating subsequent traceability and analysis. Finally, the main control module integrates the risk level identifier and judgment elements according to a preset format to form a complete quantitative risk assessment result.
[0065] Please refer to Figure 5 According to some embodiments of the present invention, in step 102, at least one human target in the environment surrounding the mobile robot is continuously tracked based on the multimodal perception data, and the dynamic motion information of the human target is obtained. Specifically, this may include, but is not limited to, the following: 501. Based on the multimodal perception data, identify at least one potential human target in the current environmental frame using a target detection algorithm; 502. Obtain initial observation data for at least one potential human target, the initial observation data including location and geometric attribute data; 503. Based on the time-series filtering algorithm, fuse the initial observation data and update the motion state of the personnel target in real time; 504. Calculate and output the dynamic motion information based on the updated motion state.
[0066] In this embodiment of the application, based on multimodal perception data, at least one potential human target in the current environmental frame is identified by a target detection algorithm. The multimodal perception data includes three-dimensional point cloud data from a lidar, color and depth images from a depth camera, and thermal radiation data from an infrared thermal imaging sensor.
[0067] Specifically, the main control module first performs spatiotemporal registration on the data collected by different sensing devices, ensuring coordinate alignment of data from each device within the same environmental frame. Then, it calls a pre-set target detection algorithm, which can employ deep learning detection models such as YOLO or Faster R-CNN, without specific limitations. The model has been trained with a large amount of sample data containing people of different body types and clothing, enabling it to adapt to personnel recognition needs in complex environments. During detection, the algorithm extracts features from the person's outline in the color image, the three-dimensional structure of the human body in the point cloud data, and the thermal radiation outline of the human body in the infrared thermal imaging data. These three types of features are then fused to identify at least one potential person target in the current environmental frame. A corresponding detection box and confidence level are labeled for each potential person target. When the confidence level exceeds a preset threshold, the corresponding person is included in the target range for subsequent tracking.
[0068] After identifying potential personnel targets, initial observation data for at least one target is acquired. This initial observation data includes position and geometric attribute data. Once the potential personnel targets are identified, the main control module extracts the initial observation data for each target from the registered multimodal perception data. The position data is calculated from the point cloud data of the LiDAR. Specifically, this involves the three-dimensional coordinates of the potential personnel target in the robot coordinate system, which are then corrected using depth information from the depth camera to ensure accuracy. The geometric attribute data includes parameters such as the potential personnel target's height, body contour dimensions, and limb length-to-width ratio. These parameters are obtained through pixel-by-pixel analysis and measurement of the personnel target's contour in the depth image. The main control module integrates the position and geometric attribute data of each potential personnel target to form the corresponding initial observation dataset.
[0069] After acquiring the initial observation data, a temporal filtering algorithm is used to fuse the initial observation data and update the motion state of the personnel target in real time. Because the mobile robot experiences jitter during movement, and the personnel target exhibits autonomous movement and posture changes, the initial observation data of a single frame contains certain errors. Therefore, a temporal filtering algorithm is needed to fuse the initial observation data from multiple frames. It should be noted that the temporal filtering algorithm can be Kalman filtering, particle filtering, etc. This application uses the Kalman filtering algorithm as an example. Specifically, this algorithm first predicts the position and motion parameters of the target in the current frame based on the motion state of the personnel target in the previous frame. Then, it compares the initial observation data extracted from the current frame with the predicted values, and corrects the prediction results by calculating the optimal estimate, thereby obtaining more accurate motion state data. During the fusion process, the algorithm assigns different weights to the observation data from different sensing devices.
[0070] Next, based on the updated motion state, the main control module calculates and outputs dynamic motion information. First, it parses the filtered motion state data, converting parameters such as instantaneous velocity and acceleration into physical quantities suitable for the actual application scenario. The main control module then compares the motion state data with preset parameter thresholds, identifying abnormal changes in the motion state, such as sudden changes in velocity or sharp deflections in the direction of motion. Finally, the main control module integrates the real-time position, velocity, direction of motion, acceleration, and motion trend parameters of the target person, encapsulates them according to a preset data format, and outputs standardized dynamic motion information.
[0071] Please refer to Figure 6 According to some embodiments of the present invention, in step 107, generating control instructions based on the risk assessment results may specifically include, but is not limited to, the following: 601. Match the preset response strategy library according to the level of the risk assessment result, and determine the basic response strategy; 602. Parameterize the basic response strategy based on the real-time dynamic motion information of the target person and the state of the mobile robot to generate at least one preliminary control command; 603. The timing of the at least one preliminary control instruction is arranged to generate a control instruction sequence.
[0072] In this embodiment, a pre-defined response strategy library is matched according to the risk assessment result level, and a basic response strategy is determined. Specifically, the main control module of the mobile robot pre-stores a complete response strategy library. This strategy library is built based on data analysis of a large number of real-world application scenarios and expert experience, and it corresponds one-to-one with the risk assessment result level. Each basic response strategy in the strategy library clearly defines the robot's core response direction and operation type under the corresponding risk level.
[0073] Specifically, the main control module first retrieves the risk assessment results of the target personnel, extracts the risk level tags, and then matches these tags with strategy entries in the response strategy library: when the risk level is low, the matched basic response strategy is "continuous tracking and monitoring, no proactive intervention." When the risk level is medium, the matched basic response strategy is "proactively approaching and inquiring, strengthening status monitoring." When the risk level is high, the matched basic response strategy is "emergency early warning reporting, full-process monitoring," with operations including triggering audible and visual warning devices, sending alarm information to the monitoring center, and continuously recording the target personnel. Through the above matching process, the main control module can quickly determine the basic response strategy suitable for the current risk level. After determining the basic response strategy, the strategy is parameterized based on the real-time dynamic motion information of the target person and the mobile robot's own state to generate at least one preliminary control command. It should be noted that the real-time dynamic motion information of the target person refers to data such as position coordinates, speed, and direction of movement continuously updated by the multi-target tracking module. The mobile robot's own state includes key parameters such as remaining battery power, the distribution of obstacles in the environment, the operational status of sensing devices, and the real-time distance to the target person.
[0074] This application can transform the abstract operational requirements in the basic response strategy into specific and quantifiable parameter instructions. For example, when the basic response strategy is "actively approaching and inquiring," the main control module first retrieves the real-time movement direction of the target person. If the target person is stationary, the approach speed parameter is set to 0.1 m / s; if the target person is moving slowly, the approach speed parameter is adjusted to 0.15 m / s, which matches the target person's movement speed. Simultaneously, considering the robot's remaining battery power, if the battery is above 50%, the approach distance is set to 1.5 m; if the battery is below 50%, the approach distance is set to 2 m, avoiding the impact of insufficient battery power on subsequent operations. By assigning parameter values to various operations of the basic response strategy, the main control module ultimately generates at least one preliminary control instruction. Each preliminary control instruction includes specific content such as operation type, execution parameters, and triggering conditions, possessing the basic conditions for direct recognition by the execution mechanism.
[0075] Finally, at least one preliminary control command is time-sequentially orchestrated to generate a control command sequence. Specifically, the main control module first categorizes the preliminary control commands into different types, such as "sensing commands," "motion commands," "interaction commands," and "communication commands," and then sorts them according to the execution logic and priority of each type of command. Through time-sequential orchestration, the originally independent multiple preliminary control commands are integrated into a logically clear and time-defined control command sequence. This sequence can be directly sent to the robot's motion control module, sensing module, interaction module, and communication module, ensuring that each module coordinates and completes the early warning response operation in an orderly manner.
[0076] Please see Figure 7 The second aspect of this application provides a personnel status perception device based on a mobile robot, the device comprising: The first acquisition unit 701 is used to acquire multimodal perception data of the environment surrounding the mobile robot; The second acquisition unit 702 is used to continuously track at least one human target in the environment surrounding the mobile robot based on the multimodal perception data, and acquire the dynamic motion information of the human target. The first determining unit 703 determines the perception triggering conditions; The third acquisition unit 704 is used to acquire at least one frame of image to be analyzed associated with the target person when it is determined that the target person in the at least one person target meets the perception triggering condition; Extraction unit 705 is used to analyze the at least one frame of image to be analyzed in order to extract the structured feature set of the target person, the structured feature set including structured feature information reflecting the state of the target person; The second determining unit 706 is used to determine the risk assessment result corresponding to the target person based on the structured feature set and the dynamic motion information; Generation unit 707 is used to generate control instructions based on the risk assessment results; The execution unit 708 is used to control the mobile robot to perform an early warning response operation according to the control command.
[0077] Please see Figure 8 This application also provides a personnel status perception device based on a mobile robot, the device comprising: Processor 801, memory 802, input / output unit 803, bus 804; The processor 801 is connected to the memory 802, the input / output unit 803, and the bus 804; The memory 802 stores a program, and the processor 801 calls the program to execute any of the methods described above.
[0078] This application also relates to a computer-readable storage medium on which a program is stored, which, when run on a computer, causes the computer to perform any of the methods described above.
[0079] Those skilled in the art will clearly understand that, for the sake of convenience and brevity, the specific working processes of the systems, devices, and units described above can be referred to the corresponding processes in the foregoing method embodiments, and will not be repeated here.
[0080] In the several embodiments provided in this application, it should be understood that the disclosed systems, apparatuses, and methods can be implemented in other ways. For example, the apparatus embodiments described above are merely illustrative; for instance, the division of units is only a logical functional division, and in actual implementation, there may be other division methods. For example, multiple units or components may be combined or integrated into another system, or some features may be ignored or not executed. Furthermore, the coupling or direct coupling or communication connection shown or discussed may be through some interfaces, or indirect coupling or communication connection between apparatuses or units, and may be electrical, mechanical, or other forms.
[0081] The units described as separate components may or may not be physically separate. The components shown as units may or may not be physical units; that is, they may be located in one place or distributed across multiple network units. Some or all of the units can be selected to achieve the purpose of this embodiment according to actual needs.
[0082] Furthermore, the functional units in the various embodiments of this application can be integrated into one processing unit, or each unit can exist physically separately, or two or more units can be integrated into one unit. The integrated unit can be implemented in hardware or as a software functional unit.
[0083] If the integrated unit is implemented as a software functional unit and sold or used as an independent product, it can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of this application, in essence, or the part that contributes to the prior art, or all or part of the technical solution, can be embodied in the form of a software product. This computer software product is stored in a storage medium and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute all or part of the steps of the methods of the various embodiments of this application. The aforementioned storage medium includes various media capable of storing program code, such as USB flash drives, portable hard drives, read-only memory (ROM), random access memory (RAM), magnetic disks, or optical disks.
Claims
1. A method for perceiving the state of people based on a mobile robot, characterized in that, The method includes: Acquire multimodal perception data of the environment surrounding the mobile robot; Based on the multimodal perception data, at least one human target in the environment surrounding the mobile robot is continuously tracked, and the dynamic motion information of the human target is obtained. Determine the perception trigger conditions; When it is determined that the target person among the at least one personnel target meets the perception triggering condition, at least one frame of image to be analyzed is acquired that is associated with the target person; The at least one frame of the image to be analyzed is analyzed to extract a structured feature set of the target person, the structured feature set including structured feature information reflecting the state of the target person; The risk assessment result corresponding to the target personnel is determined based on the structured feature set and the dynamic motion information; Control instructions are generated based on the risk assessment results; The mobile robot is controlled to perform an early warning response operation according to the control instructions.
2. The method for personnel state perception based on mobile robots according to claim 1, characterized in that, Analyzing at least one frame of the image to be analyzed to extract a structured feature set of the target person includes: Person target detection and key point localization are performed on at least one frame of the image to be analyzed to obtain at least one set of human body key point coordinate sequence of the target person, wherein the human body key points include the positions of the main joints and contour feature points of the human body; Human body parts are analyzed and posture is predicted based on the human body key point coordinate sequence to generate a posture skeleton model of the target person. The posture feature vector of the target person is calculated based on the posture skeleton model. The posture feature vector is used to describe the body orientation, limb extension and overall posture category of the target person. Based on the at least one frame of the image to be analyzed and the sequence of human body key point coordinates, the target area where the target person is located is cropped and aligned to obtain the facial area image and upper body area image of the target person. Facial feature analysis is performed on the facial region image to extract facial feature vectors, which include at least one of expression classification information, gaze direction prediction, and head posture angle. The upper body region image is subjected to appearance attribute recognition to extract appearance attribute feature vectors, which include at least one of clothing color, texture, clothing category, and whether glasses, hats, or masks are worn. The at least one frame of the image to be analyzed is subjected to associated target detection to identify the items held or accompanied by the target person, and the associated item feature vector is extracted. The associated item feature vector includes the item's category, size, and relative positional relationship with the person. The posture feature vector, the facial feature vector, the appearance attribute feature vector, and the associated item feature vector are fused and encoded to form a structured feature set of the target person.
3. The method for personnel status perception based on mobile robots according to claim 1, characterized in that, When it is determined that a target person among the at least one personnel target meets the perception trigger condition, at least one frame of image to be analyzed associated with the target person is acquired, including: When it is determined that the target person meets the perception triggering condition, the target person is locked from the current environment scene as the individual to be perceived based on the multimodal perception data; Obtain the real-time spatial coordinates of the target person in the robot coordinate system; Control parameters are generated based on the real-time spatial coordinates. The control parameters are used to control at least one image sensor of the mobile robot so that the center of the observation field of the image sensor covers the target person. The image sensor is controlled according to the control parameters to acquire at least one frame of the target person as the image to be analyzed.
4. The method for human state perception based on a mobile robot according to claim 1, characterized in that, Determining the risk assessment result corresponding to the target person based on the structured feature set and the dynamic motion information includes: The structured feature set and dynamic motion information of the target person are correlated and fused to generate descriptive information of the target person's current behavior pattern; Based on a preset risk assessment model, the descriptive information is compared and analyzed with multiple predefined risk patterns; Based on the analysis results, a quantitative risk assessment result for the target personnel is output, which includes risk level identification and judgment elements.
5. The method for personnel status perception based on mobile robots according to claim 1, characterized in that, Based on the multimodal perception data, at least one human target in the environment surrounding the mobile robot is continuously tracked, and dynamic motion information of the human target is acquired, including: Based on the multimodal perception data, at least one potential human target in the current environmental frame is identified using a target detection algorithm; Acquire initial observation data for at least one potential human target, the initial observation data including location and geometric attribute data; The motion state of the personnel target is updated in real time by fusing the initial observation data using a time-series filtering algorithm. The dynamic motion information is calculated and output based on the updated motion state.
6. The method for personnel state perception based on a mobile robot according to claim 1, characterized in that, Based on the risk assessment results, control instructions are generated, including: Based on the risk assessment results, a preset response strategy library is matched and a basic response strategy is determined. The basic response strategy is parameterized based on the real-time dynamic motion information of the target person and the state of the mobile robot to generate at least one preliminary control command. The timing of the at least one preliminary control command is arranged to generate a control command sequence.
7. The method for personnel state perception based on a mobile robot according to claim 1, characterized in that, The control command controls the mobile robot to perform an early warning response operation, including: The control commands are parsed to obtain the command type and associated parameters. The command types include communication warning, navigation motion, and multimodal interaction. The associated parameters include target location, warning level, interaction content, and target personnel identification information. The mobile robot's response behavior sequence is planned according to the instruction type and associated parameters, and the mobile robot is driven to perform the response operation.
8. A personnel status perception device based on a mobile robot, characterized in that, The device includes: The first acquisition unit is used to acquire multimodal perception data of the environment surrounding the mobile robot; The second acquisition unit is used to continuously track at least one human target in the environment surrounding the mobile robot based on the multimodal perception data, and acquire the dynamic motion information of the human target. The first determining unit determines the sensing trigger conditions; The third acquisition unit is used to acquire at least one frame of image to be analyzed associated with the target person when it is determined that the target person in the at least one person target meets the perception triggering condition; An extraction unit is used to analyze the at least one frame of the image to be analyzed in order to extract a structured feature set of the target person, the structured feature set including structured feature information reflecting the state of the target person; The second determining unit is used to determine the risk assessment result corresponding to the target person based on the structured feature set and the dynamic motion information; A generation unit is used to generate control instructions based on the risk assessment results; An execution unit is used to control the mobile robot to perform early warning response operations according to the control instructions.
9. A personnel status perception device based on a mobile robot, characterized in that, The device includes: Processor, memory, input / output units, and bus; The processor is connected to the memory, the input / output unit, and the bus; The memory stores a program, which the processor invokes to perform the method as described in any one of claims 1 to 7.
10. A computer-readable storage medium, characterized in that, The computer-readable storage medium contains a program that, when executed on a computer, performs the method as described in any one of claims 1 to 7.