Aigc-based industrial production environment multi-modal monitoring system and method

By setting up a multimodal monitoring system with fixed cameras and AGV-carried cameras in the workshop, combined with AIGC models and operator behavior analysis, the problem of missed detection of abnormal events in unimportant areas of the workshop was solved, and efficient and accurate anomaly detection was achieved.

CN120568015BActive Publication Date: 2026-06-30SUZHOU HUAKESHI TECHNOLOGY CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
SUZHOU HUAKESHI TECHNOLOGY CO LTD
Filing Date
2025-05-27
Publication Date
2026-06-30

AI Technical Summary

Technical Problem

In existing technologies, unimportant areas in the workshop, such as walls and roofs, are not equipped with monitoring cameras, which makes it difficult to detect abnormal events such as dripping water from air conditioners or peeling paint in a timely manner. Moreover, relying solely on camera monitoring images to determine abnormal events can easily lead to missed detections.

Method used

An AIGC-based multimodal monitoring system for industrial production environments is adopted. Important areas are monitored by fixed first and second cameras, and images are captured by an AGV-carrying third camera with an adjustable viewing angle. The system combines AIGC models to generate text descriptions of abnormal events, uses workers' head movements and eye directions to identify abnormal areas, and improves detection accuracy by combining voice inquiry and image analysis.

Benefits of technology

While optimizing cost-effectiveness, it can promptly detect abnormal events in the workshop, improving the accuracy and efficiency of monitoring and reducing interference with AGV operations.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN120568015B_ABST
    Figure CN120568015B_ABST
Patent Text Reader

Abstract

This application relates to the field of industrial production environment monitoring technology, and in particular to an AIGC-based multimodal monitoring system and method for industrial production environments. The multimodal monitoring system includes: a first camera and a second camera, each installed at a fixed position within a workshop, respectively monitoring a first area and a second area within the workshop; multiple AGVs that move within the workshop, each equipped with a third camera with an adjustable viewing angle; and a control device communicatively connected to the first camera, the second camera, and the multiple AGVs, and including an AIGC model. The control device is configured to: if the first camera senses a head-up movement of a first worker in the first area, and the second camera senses a head-up movement of a second worker in the second area, and both the gazes of the first worker and the second worker are pointing towards the same third area, then control the third camera of the first AGV among the multiple AGVs to capture a first image of the third area.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the field of industrial production environment monitoring technology, and in particular to an AIGC-based multimodal monitoring system and method for industrial production environment. Background Technology

[0002] To monitor the production environment within a workshop, multiple cameras are typically deployed, each responsible for monitoring different areas. Generally, the more cameras there are, the better the coverage of the workshop environment. However, this also leads to increased hardware and energy costs, as well as a heavier data processing load on the image analysis equipment connected to these cameras. Therefore, more cameras are not necessarily better.

[0003] Therefore, at least from a cost-effectiveness perspective, cameras are typically only installed in important areas of the workshop, while less important areas are not specifically equipped with monitoring cameras. For example, important areas are workstations where production activities take place, while less important areas are wall areas and roof areas where no production activities take place. However, the drawback of this design is that when abnormal events occur in less important areas of the workshop, such as dripping water from air conditioners or peeling paint, they may not be detected in a timely manner.

[0004] In addition, in traditional technology, the camera's monitoring screen is usually used to determine whether there are abnormal events in the camera's monitoring area. This may result in some abnormal events (especially in the early stages of an event) not being clearly displayed on the monitoring screen and thus not being detected in time. Summary of the Invention

[0005] The purpose of this application is to at least partially solve at least one of the above-mentioned technical problems by proposing an AIGC-based multimodal monitoring system and method for industrial production environments.

[0006] Firstly, a multimodal monitoring system for industrial production environments based on AIGC is proposed, including:

[0007] The first camera and the second camera are each installed in a fixed position inside the workshop to monitor the first area and the second area inside the workshop, respectively.

[0008] Multiple AGVs move within the workshop, each equipped with a third camera with an adjustable viewing angle;

[0009] A control device, communicatively connected to the first camera, the second camera, and the plurality of AGVs, and including an AIGC model, is used for:

[0010] If the first camera senses the head-up movement of the first worker in the first area, the second camera senses the head-up movement of the second worker in the second area, and both the first worker's gaze and the second worker's gaze are pointing towards the same third area, then the third camera of the first AGV among the plurality of AGVs is controlled to shoot a first image toward the third area.

[0011] In some possible implementations, the control device is further used for:

[0012] Determine whether there are any abnormal events in the first image;

[0013] If an abnormal event is determined to exist in the first image, a second image is generated based on the first image using the AIGC model. The second image includes text describing the abnormal event.

[0014] In some possible implementations, the control device is further used for:

[0015] The first image and the second image are displayed on the display device in the monitoring room.

[0016] In some possible implementations, before controlling the third camera of the first of the plurality of AGVs to take a first image toward the third region, the control device is further configured to:

[0017] Control the first AGV to travel to the target area between the first area and the second area.

[0018] In some possible implementations, before controlling the first AGV to travel to the target area between the first and second areas, the control device is further configured to:

[0019] From the plurality of AGVs, identify candidate AGVs that are not currently in a cargo handling state;

[0020] If there is only one candidate AGV, then the candidate AGV is determined as the first AGV, and the region between the first region and the second region that is less distant from the first AGV is determined as the target region.

[0021] In some possible implementations, the control device is further used for:

[0022] If there are multiple candidate AGVs, then determine the first candidate AGV that is closest to the first region and the second candidate AGV that is closest to the second region from among the multiple candidate AGVs;

[0023] Compare the first distance between the first candidate AGV and the first region and the second distance between the second candidate AGV and the second region;

[0024] If the first distance is less than the second distance, the first candidate AGV is determined as the first AGV, and the first region is determined as the target region; if the first distance is greater than the second distance, the second candidate AGV is determined as the first AGV, and the second region is determined as the target region.

[0025] In some possible implementations, after controlling the first AGV to travel to the target area between the first area and the second area, the control device is further configured to:

[0026] The first AGV is controlled to send a preset voice inquiry message to the target operator, wherein the target operator is the first operator or the second operator located in the target area, and the voice inquiry message is used to guide the target operator to tell the information it has observed.

[0027] Receive voice response information from the target operator;

[0028] The step of determining whether there is an abnormal event in the first image includes:

[0029] Based on the voice response information and the content of the first image, determine whether there is an abnormal event in the first image.

[0030] In some possible implementations, the control device is further used for:

[0031] If no voice response is received from the target operator within a preset time frame after the first AGV sends a voice inquiry to the target operator, then the presence of an abnormal event in the first image is determined based on the content of the first image.

[0032] In some possible implementations, if the first camera senses the head-up movement of a first worker in the first area, the second camera senses the head-up movement of a second worker in the second area, and both the gazes of the first worker and the second worker are pointing towards the same third area, then controlling the third camera of the first AGV among the plurality of AGVs to capture a first image towards the third area includes:

[0033] If at a first moment the first camera senses the head-up movement of the first worker in the first area, the second camera senses the head-up movement of the second worker in the second area, and both the first worker's gaze and the second worker's gaze are pointing towards the same third area, and at a second moment the first camera senses the head-up movement of the first worker, the second camera senses the head-up movement of the second worker, and both the first worker's gaze and the second worker's gaze are pointing towards the third area, then the third camera of the first AGV among the plurality of AGVs is controlled to capture a first image toward the third area, wherein the second moment is a moment after the first moment and at a time interval of a set duration from the first moment.

[0034] Secondly, a multimodal monitoring method for industrial production environments based on AIGC is proposed, applicable to the system described in any of the first aspects, wherein the method includes:

[0035] When the first camera senses the head-up movement of the first worker in the first area, the second camera senses the head-up movement of the second worker in the second area, and the gazes of the first worker and the second worker are both pointing towards the same third area, the third camera of the first AGV among the plurality of AGVs is controlled to shoot a first image toward the third area.

[0036] Determine whether there are any abnormal events in the first image;

[0037] If an abnormal event is determined to exist in the first image, a second image is generated based on the first image using the AIGC model. The second image includes text describing the abnormal event.

[0038] According to the AIGC-based multimodal monitoring system and method for industrial production environment provided in this application, abnormal events in the workshop can be detected in a timely manner while optimizing cost-effectiveness. Attached Figure Description

[0039] To more clearly illustrate the technical solutions of the embodiments of this application, the accompanying drawings of the embodiments will be briefly described below. Obviously, the drawings described below only relate to some embodiments of this application, and are not intended to limit this application.

[0040] Figure 1 This is a schematic diagram of a part of a factory equipped with a monitoring system, as provided in an embodiment of this application.

[0041] Figure 2 This is a flowchart of the monitoring method provided in the embodiments of this application.

[0042] Figure 3This is a flowchart of the monitoring method provided in the embodiments of this application.

[0043] Explanation of reference numerals in the attached figures:

[0044] 1000-Factory;

[0045] 100 - Monitoring system; 200 - Monitoring method;

[0046] 1-Workshop, 1A-First Area, 1B-Second Area, 1C-Third Area;

[0047] 2-Monitoring Room;

[0048] 3 - First camera, 4 - Second camera;

[0049] 5 - First operator, 6 - Second operator, 7 - Monitoring personnel;

[0050] 8 - AGV, 9 - First AGV;

[0051] 10-Control device;

[0052] 11-Production equipment;

[0053] 12-Air conditioner;

[0054] 13-Shelf;

[0055] 14-Camera;

[0056] 15-Display screen;

[0057] 16- Goods;

[0058] 17-Worker;

[0059] 18-Third camera;

[0060] 19-Lighting lamp. Detailed Implementation

[0061] To make the objectives, technical solutions, and advantages of this application clearer, the technical solutions of the embodiments of this application will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some, not all, of the embodiments of this application. Based on the described embodiments of this application, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of this application. It is understood that, without conflict, some technical means of the various embodiments described herein can be substituted for or combined with each other.

[0062] In the description of this application, the terms "first," "second," etc., are used only to distinguish the described objects and have no sequential or technical meaning. Therefore, objects specified with "first," "second," etc., may explicitly or implicitly include one or more of those objects, and, for example, the term "first element" itself does not imply the existence of a "second element," nor does the term "second element" itself imply the existence of a "first element." Furthermore, words such as "a" or "one" do not indicate a quantity limitation, but rather indicate the presence of at least one, while "multiple" indicates not less than two.

[0063] In the description of this application, the terms "comprising" or "having" indicate the presence of the said features, numbers, operations, elements, and / or combinations thereof, but do not exclude the presence or addition of one or more other features, numbers, operations, elements, and / or combinations thereof.

[0064] In the description of this application, references to "one embodiment" or "some embodiments" mean that one or more embodiments of this application include a specific feature, structure, or characteristic described in connection with that embodiment. Therefore, the phrases "in one embodiment," "in some embodiments," "in other embodiments," "in still other embodiments," etc., appearing in different parts of this specification do not necessarily refer to the same embodiment, but rather mean "one or more, but not all, embodiments," unless otherwise specifically emphasized.

[0065] Figure 1 This application illustrates an AIGC-based multimodal monitoring system 100 for industrial production environments, applied to a factory 1000 having workshops 1 and a monitoring room 2. Workshop 1 is equipped with various production equipment 11, AGVs 8 for transporting goods 16, shelves 13 for storing raw materials or products, and cameras 14 for monitoring, serving as a production activity area where multiple workers 17 perform production operations. The monitoring room 2 is equipped with numerous display devices communicatively connected to the cameras 14 in workshop 1. Workers 17, acting as monitoring personnel 7, monitor the situation in workshop 1 by observing the images displayed on these devices. Furthermore, multiple workshops 1 are typically configured, and the monitoring room 2 is physically separated from these workshops 1. For simplicity, this explanation is intended to be clear for those who understand the present technology. Figure 1 Only one workshop, Workshop 1, is shown in the image.

[0066] Cameras 14 (excluding the third camera 18 of AGV8) configured within workshop 1 include a first camera 3 and a second camera 4. The first camera 3 and the second camera 4 are each fixedly installed within workshop 1, respectively monitoring a first area 1A and a second area 1B within workshop 1. Exemplarily, both the first camera 3 and the second camera 4 are locked and fixed to the indoor ceiling of workshop 1, with the lens of the first camera 3 fixed to face the first area 1A and the lens of the second camera 4 fixed to face the second area 1B. The first area 1A and the second area 1B are completely separated areas. In other embodiments, the first area 1A and the second area 1B may partially overlap, for example, only partially. Additionally, other cameras 14 configured within workshop 1 besides the aforementioned first camera 3 and second camera 4 (e.g., ...) Figure 1 The camera 14, located below the first camera 3, is also installed in a fixed position in the workshop 1 and monitors the other workshop areas in the workshop 1, excluding the first area 1A and the second area 1B, from a fixed perspective.

[0067] By configuring each camera 14 within workshop 1, including the first camera 3 and the second camera 4, to have a fixed monitoring area, it is helpful for the monitoring personnel 7 and the monitoring system 100 to easily determine the situation in each area of ​​workshop 1, especially when the monitoring personnel 7 have been working at the factory 1000 for a long time and are familiar with each camera 14 and the workshop area corresponding to its captured image. Furthermore, the camera 14 can be a monocular camera or a multi-view camera.

[0068] As described in the background section of this application, in order to pursue cost-effectiveness, the first area 1A and the second area 1B, corresponding to the first camera 3 and the second camera 4, are the key areas requiring close monitoring within the workshop 1. More specifically, they are the first workstation area for performing the first processing of the workpiece and the second workstation area for performing the second processing of the workpiece, respectively, and at least one operator 17 is stationed in each of the first and second workstation areas. During production operations, the operators 17 in each workstation area generally do not wander to other workstation areas, and according to the constraints of the rules and regulations, even if one or more operators 17 in a certain workstation area leave due to other matters, at least one operator 17 will generally remain in that workstation area. Therefore, during production operations, under normal circumstances, the first camera 3 can always monitor the operator 17 in the first area 1A, and the second camera 4 can always monitor the operator 17 in the second area 1B. In addition, the workshop 1 is not additionally equipped with cameras 14 for monitoring unimportant areas, such as cameras 14 for monitoring the condition of the roof and walls of the workshop 1.

[0069] Multiple AGV8s (Automated Guided Vehicles) move within workshop 1 to transport corresponding goods 16, such as moving processed workpieces to shelves 13 within workshop 1. Furthermore, each AGV8 has its own adjustable-view third camera 18, which is used for navigation in some embodiments.

[0070] The monitoring system 100 also includes a control device 10, which is a computer device including a memory, a processor, and a communication unit. This control device 10 is installed in the monitoring room 2; in other embodiments, it is installed in a separate computer room. The control device 10 uses its communication unit to communicate with the first camera 3, the second camera 4, and multiple AGVs 8, thereby acquiring information from and controlling these devices. The control device 10 can communicate with the first camera 3 and the second camera 4 via both wired and wireless means, but generally only wirelessly with the AGVs. Furthermore, the control device 10 also communicates with other cameras 14 located in the workshop 1, besides the first camera 3 and the second camera 4, thereby acquiring information from and controlling these other cameras 14.

[0071] Now, please see Figure 2 The control device 10 stores a computer program in its memory, and the computer program embeds a trained AIGC (Artificial Intelligence Generated Content) model. When the computer program is invoked and executed by the processor, the control device 10 is able to control the environment of workshop 1. Figure 2 The monitoring method 200 shown includes the following steps S201 to S204:

[0072] S201, if the first camera 3 senses the head-up movement of the first operator 5 in the first area 1A, the second camera 4 senses the head-up movement of the second operator 6 in the second area 1B, and the gazes of the first operator 5 and the second operator 6 are both pointing to the same third area 1C, then the third camera 18 of the first AGV9 among the multiple AGVs is controlled to shoot a first image toward the third area 1C.

[0073] S202, Determine whether there is an abnormal event in the first image.

[0074] S203, if an abnormal event is determined to exist in the first image, a second image is generated based on the first image using the AIGC model. The second image includes text describing the abnormal event.

[0075] S204, causing the first and second images to be displayed on the display device in monitoring room 2.

[0076] The control device 10 can acquire real-time images of the first area 1A from the first camera 3 and real-time images of the second area 1B from the second camera 4. Normally, the operators 17 in the first area 1A and the second area 1B will only focus on their own workstations, so the likelihood of them simultaneously looking up and focusing on the same high area is low. However, if at a certain time, both the first operator 5 in the first area 1A and the first operator 5 in the second area 1B look up, and both are pointing towards the same high area, it indicates that there may be an anomaly in that high area of ​​workshop 1, namely the third area 1C. Therefore, when the control device 10 becomes aware of this situation, it can control the third camera 18 of the corresponding AGV9 to capture a first image of the third area 1C, so as to further determine whether there is an anomaly in the third area 1C based on the content of the first image.

[0077] Since the methods for determining the head posture and gaze direction of a person using visual technology are well known to technicians, they will not be described in detail. In some embodiments, the method for determining the gaze direction of operators 5 and 6 can be simplified to: determining the facial orientation of operators 5 and 6 as the gaze direction of operators 5 and 6. In still other embodiments, the method for determining the gaze direction of operators 5 and 6 can be: infrared light emitted by the infrared light emitters at cameras 3 and 4 is reflected by the glasses of operators 5 and 6, and the reflected infrared light can be sensed by cameras 3 and 4, thereby determining the gaze direction of operators 5 and 6.

[0078] It is understandable that the probability of two operators 17 in different workstations simultaneously looking up and focusing on the same high area is lower than that of two operators 17 in the same workstation area (e.g., when a noticeable event occurs at a slightly higher position of the production equipment 11 at a certain workstation, especially those workstations where the operator 17 is required to look up based on the needs of production operations). Therefore, when the latter occurs, it is more necessary to suspend the operation of the corresponding AGV8 and use the third camera 18 on it to take an image of the high area.

[0079] Even if no abnormal event occurs in a certain high area (such as the air conditioner 12 area or the lighting 19 area on the roof), it is not uncommon for two workers 17, such as the first worker 5 and the second worker 6, located in different workstations, to simultaneously look up at that high area, for example, when both workers 17 are unconsciously looking at the same location on the roof of workshop 1. Therefore, if the AGV8 is simply suspended from its primary function (transportation work) once the situation described in S201 occurs, it is likely to unnecessarily reduce the efficiency of the AGV8.

[0080] In some implementations, step S201 specifically includes: if at a first moment, the first camera 3 senses the head-up movement of the first worker 5 in the first area 1A, the second camera 4 senses the head-up movement of the second worker 6 in the second area 1B, and the gazes of the first worker 5 and the second worker 6 are both pointing to the same third area 1C, and at a second moment, the first camera 3 senses the head-up movement of the first worker 5, the second camera 4 senses the head-up movement of the second worker 6, and the gazes of the first worker 5 and the second worker 6 are both pointing to the third area 1C, then the third camera 18 of the first AGV9 among the plurality of AGVs is controlled to shoot a first image toward the third area 1C, wherein the second moment is a moment after the first moment and at a time interval of a set duration from the first moment.

[0081] The aforementioned set duration can be a duration pre-written into the computer program of the control device 10. In one example, the set duration is 5 seconds. That is, when the first camera 3 and the second camera 4 sense at a first moment that both the first worker 5 and the second worker 6 have made a head-up movement and their gazes are both directed towards the third area 1C at a higher position, the timer is immediately started. When the timer reaches 5 seconds, the control device 10 will again determine, based on the monitoring images of the first camera 3 and the second camera 4 at the current moment (i.e., the second moment delayed by 5 seconds from the first moment), whether the first worker 5 and the second worker 6 are still making a head-up movement and are both observing the third area 1C. If so, the control device 10 will control the third camera 18 of the corresponding AGV8, i.e., the first AGV9, to take a first image towards the third area 1C. If not, it can be ignored. In addition, the set duration can be changed by authorized personnel (e.g., the workshop 1 manager).

[0082] If the third camera 18 controlling the first AGV9 directly captures a first image of the third region 1C from the current position of the first AGV9, the first image may not accurately reflect the observations of the first operator 5 and the second operator 6 of the third region 1C due to factors such as viewing angle. For example, the first operator 5 and the second operator 6 may observe an abnormal event in the third region 1C from the first region 1A and the second region 1B respectively (e.g., the air conditioner 12 dripping water, the lighting 19 malfunctioning, the roof plaster peeling off, etc.), but the image captured from another perspective or another region towards the third region 1C may not reflect this abnormal event due to obstructed viewing angle. In some embodiments, such as... Figure 3 As shown, before controlling the third camera 18 of the first AGV9 among the plurality of AGV8s to capture the first image toward the third region 1C as described in step S301, method 200 further includes:

[0083] S306, control the first AGV9 to move to the target area between the first area 1A and the second area 1B.

[0084] That is, after controlling the first AGV9 to move to the first area 1A or the second area 1B, the third camera 18 of the first AGV9 is then controlled to take a first image towards the third area 1C. In this way, since the first AGV9 takes a picture of the third area 1C from the perspective of the first operator 5 or the second operator 6, the first image can better reproduce the observation of the first operator 5 or the second operator 6 of the third area 1C.

[0085] Please continue reading Figure 3 Before step S306, that is, before controlling the first AGV9 to move to the target area between the first area 1A and the second area 1B, method 200 further includes determining the first AGV9 from a plurality of AGV8s, specifically including:

[0086] S301, Identify candidate AGVs that are not currently in a cargo handling state from multiple AGV8s.

[0087] S302, if there is only one candidate AGV, then the candidate AGV is determined as the first AGV9, and the region between the first region 1A and the second region 1B with the smaller distance from the first AGV9 is determined as the target region.

[0088] S303, if there are multiple candidate AGVs, then determine the first candidate AGV closest to the first region 1A and the second candidate AGV closest to the second region 1B from the multiple candidate AGVs. Then, proceed to step S304.

[0089] S304, compare the first distance between the first candidate AGV and the first region 1A, and the second distance between the second candidate AGV and the second region 1B. Then, proceed to step S305.

[0090] S305, if the first distance is less than the second distance, then the first candidate AGV is determined as the first AGV9, and the first region 1A is determined as the target region; if the first distance is greater than the second distance, then the second candidate AGV is determined as the first AGV9, and the second region 1B is determined as the target region. Alternatively, if the first distance and the second distance are equal, one of the first candidate AGV and the second candidate AGV9 can be randomly selected as the first AGV9, and the region corresponding to the randomly selected first AGV9 in the first region 1A and the second region 1B can be determined as the target region.

[0091] The control device 10 can identify all candidate AGVs from a plurality of AGVs that are not currently in a cargo handling state (e.g., an AGV that has just unloaded cargo 16 and is preparing to handle the next cargo 16, and is temporarily in an empty state). If there is only one such candidate AGV, then that single candidate AGV is identified as the target AGV8 mentioned in step S301, and the region between the first region 1A and the second region 1B with the smaller distance to the first AGV9 is identified as the target region to which the first AGV9 will travel. If there are multiple candidate AGVs, then the operation of steps S304 to S306 is used to identify an AGV8 as the first AGV9: this AGV8 is the AGV8 that is closest to the first region 1A or the second region 1B among all the AGVs that are not currently in a cargo handling state.

[0092] Since the first AGV9 is the AGV8 that is not currently in the cargo handling state, on the one hand, it will not significantly affect the cargo handling efficiency of the workshop 16, and on the other hand, it helps to ensure the safety of the method 200 during execution and helps to avoid suppressing the increase in energy consumption of the first AGV9.

[0093] If the control device 10 determines whether an abnormal event exists solely based on the image information in the first image, the judgment result may differ from the actual result due to the limited information relied upon, especially when the image information in the first image is not clear enough. For further details, please refer to [link to relevant documentation]. Figure 3 After controlling the first AGV9 to travel to the target area within the first area 1A and the second area 1B, the method 200 further includes:

[0094] S307, control the first AGV9 to send a preset voice inquiry message to the target operator, wherein the target operator is the first operator 5 or the second operator 6 located in the target area, and the voice inquiry message is used to guide the target operator to tell about the information it has observed.

[0095] It is understood that when the target area is the first area 1A, the target operator is the first operator 5; when the target area is the second area 1B, the target operator is the second operator 6. In some embodiments, the voice inquiry information may be audio such as "Have you observed any abnormalities?" played from the sound output device of the first AGV9.

[0096] S308 receives voice response information from the target operator.

[0097] Step S308 can be performed either before, after, or simultaneously with the control of the third camera 18 of the first AGV9 to capture the first image in the third area 1C. After receiving voice response information from the target operator and controlling the third camera 18 of the first AGV9 to capture the first image in the third area 1C, the aforementioned step S202, which involves determining whether an abnormal event exists in the first image, can specifically include: determining whether an abnormal event exists in the first image based on the voice response information and the content of the first image.

[0098] Thus, since the control device 10 determines whether there is an abnormal event in the third region 1C based not only on the content of the first image but also on the voice response information of the target operator, the accuracy and reliability of the determination result are higher. In some embodiments, the operation of step S202 can also be performed by the AIGC model located in the control device 10. As a trained artificial intelligence model, the AIGC model can determine whether there is an abnormal event in the third region 1C based on joint analysis of keywords in the voice response information and the content of the first image. For example, in one example, the voice response indicates that there is a high probability of an abnormal event in the third area 1C, but the content of the first image shows that the abnormal event is a visual error by the operator. In this case, the control device 10 can determine that there is no abnormal event in the first image. In another example, the voice response indicates that there is a probability of an abnormal event in the third area 1C, and the content of the first image shows that there is also a probability of the abnormal event in the third area 1C. In this case, the control device 10 can determine that there is an abnormal event in the first image. In this case, since the operation of the aforementioned step S204 has been performed, the monitoring personnel 7 on duty in the monitoring room 2 can further determine the abnormal event based on the first and second images they have observed and their own experience, and choose whether to take appropriate emergency actions.

[0099] In some implementations, to attract the attention of monitoring personnel 7 and to visually represent the anomaly from the image, the aforementioned second image generated by the AIGC model can exaggerate the anomaly. For example, when an anomaly of an air conditioner dripping water is detected in the first image, the generated second image includes the air conditioner on the roof and a large number of water droplets dripping from it, and the edge area of ​​the second image contains text such as "Air conditioner No. 0046 in Workshop No. 3 has a dripping fault".

[0100] In some cases, after the first AGV9 sends a preset voice query to the target operator, the target operator may not respond to the voice query. When this happens, if step S202 is executed only after the first AGV9 receives a voice response, the method 200 will not function properly. Therefore, in some embodiments, method 200 further includes: if no voice response is received from the target operator within a threshold time period after the first AGV9 sends the preset voice query, then determining whether an abnormal event exists in the first image based on the content of the first image. The threshold time period could be, for example, 5 seconds.

[0101] In some implementations, if it is determined in step S202 that no abnormal event exists in the first image, the control device 10 can use the AIGC model to generate a third image based on the first image and display the third image on the display device in the monitoring room 2. The third image includes text describing the first event, which is the observation event of the first operator 5 and the second operator 6 on the third area 1C. The third image can be formed by directly adding text describing the first event to the first image.

[0102] In other embodiments, if it is determined in step S202 that there are no abnormal events in the first image, the first AGV9 can be controlled to immediately switch to normal operation and the first image can be stored in the memory of the control device 10 so that authorized personnel can retrieve and view it.

Claims

1. A multimodal monitoring system for industrial production environment based on AIGC, characterized in that, include: The first camera and the second camera are each installed in a fixed position inside the workshop to monitor the first area and the second area inside the workshop, respectively. Multiple AGVs move within the workshop, each equipped with a third camera with an adjustable viewing angle; A control device, communicatively connected to the first camera, the second camera, and the plurality of AGVs, and including an AIGC model, is used for: If the first camera senses the head-raising movement of the first worker in the first area, and the second camera senses the head-raising movement of the second worker in the second area, and both the first worker's gaze and the second worker's gaze are directed towards the same third area, then: Control the first AGV among the plurality of AGVs to travel to the target area between the first area and the second area; The third camera, which controls the first AGV, captures a first image in the target area, facing the third area. The first AGV is controlled to send a preset voice inquiry message to the target operator, wherein the target operator is the first operator or the second operator located in the target area, and the voice inquiry message is used to guide the target operator to tell the information it has observed. Receive voice response information from the target operator; Based on the voice response information and the content of the first image, determine whether there is an abnormal event in the first image.

2. The system according to claim 1, characterized in that, The control device is also used for: If an abnormal event is detected in the first image, the AIGC model is used to generate a second image based on the first image, and the second image includes text describing the abnormal event.

3. The system according to claim 2, characterized in that, The control device is also used for: The first image and the second image are displayed on the display device in the monitoring room.

4. The system according to claim 1, characterized in that, Before controlling the first AGV to travel to the target area between the first area and the second area, the control device is further configured to: From the plurality of AGVs, identify candidate AGVs that are not currently in a cargo handling state; If there is only one candidate AGV, then the candidate AGV is determined as the first AGV, and the region between the first region and the second region that is less distant from the first AGV is determined as the target region.

5. The system according to claim 4, characterized in that, The control device is also used for: If there are multiple candidate AGVs, then determine the first candidate AGV that is closest to the first region and the second candidate AGV that is closest to the second region from among the multiple candidate AGVs; Compare the first distance between the first candidate AGV and the first region and the second distance between the second candidate AGV and the second region; If the first distance is less than the second distance, the first candidate AGV is determined as the first AGV, and the first region is determined as the target region; if the first distance is greater than the second distance, the second candidate AGV is determined as the first AGV, and the second region is determined as the target region.

6. The system according to claim 1, characterized in that, The control device is also used for: If no voice response is received from the target operator within a preset time frame after the first AGV sends a voice inquiry to the target operator, then the presence of an abnormal event in the first image is determined based on the content of the first image.

7. The system according to claim 1, characterized in that, If the first camera senses the head-up movement of the first worker in the first area, and the second camera senses the head-up movement of the second worker in the second area, and both the first worker's gaze and the second worker's gaze are pointing towards the same third area, then the third camera of the first AGV among the plurality of AGVs is controlled to capture a first image towards the third area, including: If at a first moment the first camera senses the head-up movement of the first worker in the first area, the second camera senses the head-up movement of the second worker in the second area, and both the first worker's gaze and the second worker's gaze are pointing towards the same third area, and at a second moment the first camera senses the head-up movement of the first worker, the second camera senses the head-up movement of the second worker, and both the first worker's gaze and the second worker's gaze are pointing towards the third area, then the third camera of the first AGV among the plurality of AGVs is controlled to capture a first image toward the third area, wherein the second moment is a moment after the first moment and at a time interval of a set duration from the first moment.

8. A multimodal monitoring method for industrial production environment based on AIGC, characterized in that, Applied to the system as described in any one of claims 1 to 7, the method comprises: When the first camera senses the head-raising movement of the first worker in the first area, the second camera senses the head-raising movement of the second worker in the second area, and both the gazes of the first worker and the gazes of the second worker are pointing towards the same third area, Control the first AGV among the plurality of AGVs to travel to the target area between the first area and the second area; The third camera of the first AGV is controlled to capture a first image in the target area; the first AGV is controlled to send a preset voice inquiry message to the target operator, wherein the target operator is the first operator or the second operator located in the target area, and the voice inquiry message is used to guide the target operator to tell the information he / she has observed. Receive voice response information from the target operator; Based on the voice response information and the content of the first image, determine whether there is an abnormal event in the first image; If an abnormal event is detected in the first image, the AIGC model is used to generate a second image based on the first image, and the second image includes text describing the abnormal event.